I have a dataframe and list of columns in that dataframe that I'd like to drop. Let's use the iris
dataset as an example. I'd like to drop Sepal.Length
and Sepal.Width
and use only the remaining columns. How do I do this using select
or select_
from the dplyr
package?
Here's what I've tried so far:
drop.cols <- c('Sepal.Length', 'Sepal.Width')
iris %>% select(-drop.cols)
Error in -drop.cols : invalid argument to unary operator
iris %>% select_(.dots = -drop.cols)
Error in -drop.cols : invalid argument to unary operator
iris %>% select(!drop.cols)
Error in !drop.cols : invalid argument type
iris %>% select_(.dots = !drop.cols)
Error in !drop.cols : invalid argument type
I feel like I'm missing something obvious because these seems like a pretty useful operation that should already exist. On Github, someone posted a similar issue, and Hadley said to use 'negative indexing'. That's what (I think) I've tried, but to no avail. Any suggestions?
also try
## Notice the lack of quotes
iris %>% select (-c(Sepal.Length, Sepal.Width))
You can try
iris %>% select(-!!drop.cols)
Beyond select(-one_of(drop.cols))
there are a couple other options for dropping columns using select()
that do not involve defining all the specific column names (using the dplyr starwars sample data for some more variety in column names):
starwars %>%
select(-(name:mass)) %>% # the range of columns from 'name' to 'mass'
select(-contains('color')) %>% # any column name that contains 'color'
select(-starts_with('bi')) %>% # any column name that starts with 'bi'
select(-ends_with('er')) %>% # any column name that ends with 'er'
select(-matches('^f.+s$')) %>% # any column name matching the regex pattern
select_if(~!is.list(.)) %>% # not by column name but by data type
head(2)
# A tibble: 2 x 2
homeworld species
<chr> <chr>
1 Tatooine Human
2 Tatooine Droid
Be careful with the select()
function, because it's used both in the dplyr and MASS packages, so if MASS is loaded, select() may not work properly. To find out what packages are loaded, type sessionInfo()
and look for it in the "other attached packages:" section. If it is loaded, type detach( "package:MASS", unload = TRUE )
, and your select()
function should work again.
If you have a special character in the column names, either select
or select_
may not work as expected.
This property of dplyr
of using "."
. To refer to the data set in the question, the following line can be used to solve this problem:
drop.cols <- c('Sepal.Length', 'Sepal.Width')
iris %>% .[,setdiff(names(.),drop.cols)]
Another way is to mutate the undesired columns to NULL
, this avoids the embedded parentheses :
head(iris,2) %>% mutate_at(drop.cols, ~NULL)
# Petal.Length Petal.Width Species
# 1 1.4 0.2 setosa
# 2 1.4 0.2 setosa
We can try
iris %>%
select_(.dots= setdiff(names(.),drop.cols))
Source: Stackoverflow.com