[r] Subset data to contain only columns whose names match a condition

Is there a way for me to subset data based on column names starting with a particular string? I have some columns which are like ABC_1 ABC_2 ABC_3 and some like XYZ_1, XYZ_2,XYZ_3 let's say.

How can I subset my df based only on columns containing the above portions of text (lets say, ABC or XYZ)? I can use indices, but the columns are too scattered in data and it becomes too much of hard coding.

Also, I want to only include rows from each of these columns where any of their value is >0 so if either of the 6 columns above has a 1 in the row, it makes a cut into my final data frame.

This question is related to r subset

The answer is


Using dplyr you can:

df <- df %>% dplyr:: select(grep("ABC", names(df)), grep("XYZ", names(df)))

You can also use starts_with and dplyr's select() like so:

df <- df %>% dplyr:: select(starts_with("ABC"))

This worked for me:

df[,names(df) %in% colnames(df)[grepl(str,colnames(df))]]

Just in case for data.table users, the following works for me:

df[, grep("ABC", names(df)), with = FALSE]