Remove an entire column from a data frame in R

Question

Does anyone know how to remove an entire column from a data frame in R  For example if I am given this data frame    gt  head data     chr       genome region 1 chr1 hg19 refGene    CDS 2 chr1 hg19 refGene   exon 3 chr1 hg19 refGene    CDS 4 chr1 hg19 refGene   exon 5 chr1 hg19 refGene    CDS 6 chr1 hg19 refGene   exon   and I want to remove the 2nd column

User · Answer

There are several options for removing one or more columns with dplyr::select() and some helper functions. The helper functions can be useful because some do not require naming all the specific columns to be dropped. Note that to drop columns using select() you need to use a leading - to negate the column names.

Using the dplyr::starwars sample data for some variety in column names:

library(dplyr)

starwars %>% 
  select(-height) %>%                  # a specific column name
  select(-one_of('mass', 'films')) %>% # any columns named in one_of()
  select(-(name:hair_color)) %>%       # the range of columns from 'name' to 'hair_color'
  select(-contains('color')) %>%       # any column name that contains 'color'
  select(-starts_with('bi')) %>%       # any column name that starts with 'bi'
  select(-ends_with('er')) %>%         # any column name that ends with 'er'
  select(-matches('^v.+s$')) %>%       # any column name matching the regex pattern
  select_if(~!is.list(.)) %>%          # not by column name but by data type
  head(2)

# A tibble: 2 x 2
homeworld species
  <chr>     <chr>  
1 Tatooine  Human  
2 Tatooine  Droid

You can also drop by column number:

starwars %>% 
  select(-2, -(4:10)) # column 2 and columns 4 through 10

User · Answer

To remove one or more columns by name  when the column names are known  as opposed to being determined at run-time   I like the subset   syntax  E g  for the data-frame  df  lt - data frame a 1 3  d 2 4  c 3 5  b 4 6    to remove just the a column you could do  Data  lt - subset  Data  select   -a     and to remove the b and d columns you could do  Data  lt - subset  Data  select   -c d  b       You can remove all columns between d and b with   Data  lt - subset  Data  select   -c  d   b     As I said above  this syntax works only when the column names are known  It won t work when say the column names are determined programmatically  i e  assigned to a variable   I ll reproduce this Warning from the  subset documentation      Warning       This is a convenience function intended for use interactively    For programming it is better to use the standard subsetting   functions like      and in particular the non-standard evaluation   of argument  subset  can have unanticipated consequences

User · Answer

For completeness  If you want to remove columns by name  you can do this   cols dont want  lt -  genome  cols dont want  lt - c  genome    region     if you want to remove multiple columns  data  lt - data     names data   in  cols dont want  drop   F    Including drop   F ensures that the result will still be a data frame even if only one column remains

User · Answer

You can set it to NULL    gt  Data genome  lt - NULL  gt  head Data     chr region 1 chr1    CDS 2 chr1   exon 3 chr1    CDS 4 chr1   exon 5 chr1    CDS 6 chr1   exon   As pointed out in the comments  here are some other possibilities   Data 2   lt - NULL      Wojciech Sobala Data  2    lt - NULL    same as above Data  lt - Data  -2     Ian Fellows Data  lt - Data -2      same as above   You can remove multiple columns via   Data 1 2   lt - list NULL     Marek Data 1 2   lt - NULL          does not work    Be careful with matrix-subsetting though  as you can end up with a vector   Data  lt - Data  - 2 3                 vector Data  lt - Data  - 2 3  drop FALSE     still a data frame

User · Answer

With this you can remove the column and store variable into another variable   df   subset data  select   -c genome

User · Answer

The posted answers are very good when working with data frames  However  these tasks can be pretty inefficient from a memory perspective  With large data  removing a column can take an unusually long amount of time and or fail due to out of memory errors  Package data table helps address this problem with the    operator   library data table   gt  dt  lt - data table a   1  b   1  c   1   gt  dt  a  NULL       b c  1   1 1   I should put together a bigger example to show the differences  I ll update this answer at some point with that

[r] Remove an entire column from a data.frame in R

Examples related to r

Examples related to dataframe