Drop data frame columns by name

Question

I have a number of columns that I would like to remove from a data frame  I know that we can delete them individually using something like   df x  lt - NULL   But I was hoping to do this with fewer commands   Also  I know that I could drop columns using integer indexing like this   df  lt - df  -c 1  3 6  12      But I am concerned that the relative position of my variables may change   Given how powerful R is  I figured there might be a better way than dropping each column one by one

User · Answer

Provide the data frame and a string of comma separated names to remove   remove features  lt - function df  features      rem vec  lt - unlist strsplit features           res  lt - df    names df   in  rem vec     return res      Usage    remove features iris   Sepal Length  Petal Width

User · Answer

list NULL  also works   dat  lt - mtcars colnames dat     1   mpg    cyl    disp   hp     drat   wt     qsec   vs     am     gear     11   carb  dat  c  mpg   cyl   wt     lt - list NULL  colnames dat     1   disp   hp     drat   qsec   vs     am     gear   carb

User · Answer

Another solution if you don t want to use  hadley s above  If  COLUMN NAME  is the name of the column you want to drop    df  -which names df      COLUMN NAME

User · Answer

Another dplyr answer   If your variables have some common naming structure  you might try starts with     For example  library dplyr  df  lt - data frame var1   rnorm 5   var2   rnorm 5   var3   rnorm  5                     var4   rnorm 5   char1   rnorm 5   char2   rnorm 5   df          var2      char1        var4       var3       char2       var1  1 -0 4629512 -0 3595079 -0 04763169  0 6398194  0 70996579 0 75879754  2  0 5489027  0 1572841 -1 65313658 -1 3228020 -1 42785427 0 31168919  3 -0 1707694 -0 9036500  0 47583030 -0 6636173  0 02116066 0 03983268 df1  lt - df   gt   select -starts with  char    df1          var2        var4       var3       var1  1 -0 4629512 -0 04763169  0 6398194 0 75879754  2  0 5489027 -1 65313658 -1 3228020 0 31168919  3 -0 1707694  0 47583030 -0 6636173 0 03983268   If you want to drop a sequence of variables in the data frame  you can use     For example if you wanted to drop var2  var3  and all variables in between  you d just be left with var1   df2  lt - df1   gt   select -c var2 var3      df2          var1  1 0 75879754  2 0 31168919  3 0 03983268

User · Answer

Here is a dplyr way to go about it    df  -c 1 3 6  12       original df cut  lt - df   gt   select -col to drop 1  -col to drop 2       -col to drop 6     with dplyr  select     I like this because it s intuitive to read  amp  understand without annotation and robust to columns changing position within the data frame  It also follows the vectorized idiom using - to remove elements

User · Answer

Another possibility   df  lt - df   setdiff names df   c  a    c       or  df  lt - df   grep    a c     names df   invert TRUE

User · Answer

Out of interest  this flags up one of R s weird multiple syntax inconsistencies   For example given a two-column data frame   df  lt - data frame x 1  y 2    This gives a data frame  subset df  select -y    but this gives a vector  df  -2    This is all explained in    but it s not exactly expected behaviour   Well at least not to me

User · Answer

If you want remove the columns by reference and avoid the internal copying associated with data frames then you can use the data table package and the function     You can pass a character vector names to the left hand side of the    operator  and NULL as the RHS   library data table   df  lt - data frame a 1 10  b 1 10  c 1 10  d 1 10  DT  lt - data table df    or more simply  DT  lt - data table a 1 10  b 1 10  c 1 10  d 1 10     DT   c  a   b      NULL    If you want to predefine the names as as character vector outside the call to    wrap the name of the object in    or    to force the LHS to be evaluated in the calling scope not as a name within the scope of DT   del  lt - c  a   b   DT  lt - data table a 1 10  b 1 10  c 1 10  d 1 10  DT    del     NULL  DT  lt -   lt - data table a 1 10  b 1 10  c 1 10  d 1 10  DT    del     NULL    force or  c  would also work       You can also use set  which avoids the overhead of   data table  and also works for data frames   df  lt - data frame a 1 10  b 1 10  c 1 10  d 1 10  DT  lt - data table df     drop  a  from df  no copying involved   set df  j    a   value   NULL    drop  b  from DT  no copying involved  set DT  j    b   value   NULL

User · Answer

Dplyr Solution  I doubt this will get much attention down here  but if you have a list of columns that you want to remove  and you want to do it in a dplyr chain I use one of   in the select clause   Here is a simple  reproducable example   undesired  lt - c  mpg    cyl    hp    mtcars  lt - mtcars   gt     select -one of undesired     Documentation can be found by running  one of or here   http   genomicsclass github io book pages dplyr tutorial html

User · Answer

within df  rm x     is probably easiest  or for multiple variables   within df  rm x  y     Or if you re dealing with data tables  per How do you delete a column by name in data table     dt   x    NULL      Deletes column x by reference instantly   dt     x       Selects all but x into a new data table    or for multiple variables  dt   c  x   y      NULL   dt    c  x    y

User · Answer

I keep thinking there must be a better idiom  but for subtraction of columns by name  I tend to do the following   df  lt - data frame a 1 10  b 1 10  c 1 10  d 1 10     return everything except a and c df  lt - df  -match c  a   c   names df    df

User · Answer

You could use  in  like this   df     colnames df   in  c  x   bar   foo

User · Answer

Beyond select -one of drop col names   demonstrated in earlier answers  there are a couple other dplyr options for dropping columns using select   that do not involve defining all the specific column names  using the dplyr starwars sample data for some variety in column names            library dplyr  starwars   gt      select - name mass     gt            the range of columns from  name  to  mass    select -contains  color      gt      any column name that contains  color    select -starts with  bi      gt      any column name that starts with  bi    select -ends with  er      gt        any column name that ends with  er    select -matches   f  s       gt      any column name matching the regex pattern   select if   is list       gt         not by column name but by data type   head 2     A tibble  2 x 2 homeworld species    lt chr gt       lt chr gt    1 Tatooine  Human   2 Tatooine  Droid    If you need to drop a column that may or may not exist in the data frame  here s a slight twist using select if   that unlike using one of   will not throw an Unknown columns  warning if the column name does not exist  In this example  bad column  is not a column in the data frame   starwars   gt      select if  names     in  c  height    mass    bad column

User · Answer

If you have a large data frame and are low on memory use           or rm and within to remove columns of a data frame  as subset is currently  R 3 6 2  using more memory - beside the hint of the manual to use subset interactively   getData  lt - function       n  lt - 1e7   set seed 7    data frame a   runif n   b   runif n   c   runif n   d   runif n      DF  lt - getData   tt  lt - sum  Internal gc FALSE  TRUE  TRUE   13 14   DF  lt - DF setdiff names DF   c  a    c         DF  lt - DF   names DF   in  c  a    c      Alternative  DF  lt - DF -match c  a   c   names DF      Alternative sum  Internal gc FALSE  FALSE  TRUE   13 14   - tt  0 1 MB are used  DF  lt - getData   tt  lt - sum  Internal gc FALSE  TRUE  TRUE   13 14   DF  lt - subset DF  select   -c a  c      sum  Internal gc FALSE  FALSE  TRUE   13 14   - tt  357 MB are used  DF  lt - getData   tt  lt - sum  Internal gc FALSE  TRUE  TRUE   13 14   DF  lt - within DF  rm a  c      sum  Internal gc FALSE  FALSE  TRUE   13 14   - tt  0 1 MB are used  DF  lt - getData   tt  lt - sum  Internal gc FALSE  TRUE  TRUE   13 14   DF c  a    c      lt - NULL    sum  Internal gc FALSE  FALSE  TRUE   13 14   - tt  0 1 MB are used

User · Answer

You can use a simple list of names    DF  lt - data frame    x 1 10    y 10 1    z rep 5 10     a 11 20   drops  lt - c  x   z   DF      names DF   in  drops     Or  alternatively  you can make a list of those to keep and refer to them by name    keeps  lt - c  y    a   DF keeps    EDIT   For those still not acquainted with the drop argument of the indexing function  if you want to keep one column as a data frame  you do   keeps  lt -  y  DF    keeps  drop   FALSE    drop TRUE  or not mentioning it  will drop unnecessary dimensions  and hence return a vector with the values of column y

User · Answer

There s also the subset command  useful if you know which columns you want   df  lt - data frame a   1 10  b   2 11  c   3 12  df  lt - subset df  select   c a  c     UPDATED after comment by  hadley  To drop columns a c you could do   df  lt - subset df  select   -c a  c

User · Answer

DF  lt - data frame    x 1 10    y 10 1    z rep 5 10     a 11 20   DF   Output        x  y z  a 1   1 10 5 11 2   2  9 5 12 3   3  8 5 13 4   4  7 5 14 5   5  6 5 15 6   6  5 5 16 7   7  4 5 17 8   8  3 5 18 9   9  2 5 19 10 10  1 5 20     DF c  a   x     lt - list NULL    Output           y z     1  10 5     2   9 5     3   8 5     4   7 5     5   6 5     6   5 5     7   4 5     8   3 5         9   2 5     10  1 5

User · Answer

Find the index of the columns you want to drop using which  Give these indexes a negative sign   -1   Then subset on those values  which will remove them from the dataframe  This is an example   DF  lt - data frame one c  a   b    two c  c    d    three c  e    f    four c  g    h    DF    one two three four  1   a   d     f    i  2   b   e     g    j  DF which names DF   in  c  two   three     -1     one four  1   a    g  2   b    h

User · Answer

There s a function called dropNamed   in Bernd Bischl s BBmisc package that does exactly this   BBmisc  dropNamed df   x     The advantage is that it avoids repeating the data frame argument and thus is suitable for piping in magrittr  just like the dplyr approaches    df   gt   BBmisc  dropNamed  x

User · Answer

There is a potentially more powerful strategy based on the fact that grep   will return a numeric vector  If you have a long list of variables as I do in one of my dataset  some variables that end in   A  and others that end in   B  and you only want the ones that end in   A   along with all the variables that don t match either pattern  do this   dfrm2  lt - dfrm    -grep     B    names dfrm       For the case at hand  using Joris Meys example  it might not be as compact  but it would be   DF  lt - DF   -grep  paste     drops      sep     collapse       names DF

[r] Drop data frame columns by name

Examples related to r

Examples related to dataframe

Examples related to r-faq