[r] How do you delete a column by name in data.table?

To get rid of a column named "foo" in a data.frame, I can do:

df <- df[-grep('foo', colnames(df))]

However, once df is converted to a data.table object, there is no way to just remove a column.

Example:

df <- data.frame(id = 1:100, foo = rnorm(100))
df2 <- df[-grep('foo', colnames(df))] # works
df3 <- data.table(df)
df3[-grep('foo', colnames(df3))] 

But once it is converted to a data.table object, this no longer works.

This question is related to r data.table

The answer is


For a data.table, assigning the column to NULL removes it:

DT[,c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the extra comma if DT is a data.table

... which is the equivalent of:

DT$col1 <- NULL
DT$col2 <- NULL
DT$col3 <- NULL
DT$col4 <- NULL

The equivalent for a data.frame is:

DF[c("col1", "col1", "col2", "col2")] <- NULL
      ^
      |---- Notice the missing comma if DF is a data.frame

Q. Why is there a comma in the version for data.table, and no comma in the version for data.frame?

A. As data.frames are stored as a list of columns, you can skip the comma. You could also add it in, however then you will need to assign them to a list of NULLs, DF[, c("col1", "col2", "col3")] <- list(NULL).


I simply do it in the data frame kind of way:

DT$col = NULL

Works fast and as far as I could see doesn't cause any problems.

UPDATE: not the best method if your DT is very large, as using the $<- operator will lead to object copying. So better use:

DT[, col:=NULL]

DT[,c:=NULL] # remove column c

Suppose your dt has columns col1, col2, col3, col4, col5, coln.

To delete a subset of them:

vx <- as.character(bquote(c(col1, col2, col3, coln)))[-1]
DT[, paste0(vx):=NULL]

You can also use set for this, which avoids the overhead of [.data.table in loops:

dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )
set( dt, j=c(1L,3L,5L), value=NULL )
> dt[1:5]
   b d
1: A a
2: B b
3: C c
4: D d
5: E e

If you want to do it by column name, which(colnames(dt) %in% c("a","c","e")) should work for j.


Here is a way when you want to set a # of columns to NULL given their column names a function for your usage :)

deleteColsFromDataTable <- function (train, toDeleteColNames) {

       for (myNm in toDeleteColNames)

       train <- train [,(myNm):=NULL]

       return (train)
}

Very simple option in case you have many individual columns to delete in a data table and you want to avoid typing in all column names #careadviced

dt <- dt[, -c(1,4,6,17,83,104)]

This will remove columns based on column number instead.

It's obviously not as efficient because it bypasses data.table advantages but if you're working with less than say 500,000 rows it works fine