Is there a way for me to subset data based on column names starting with a particular string? I have some columns which are like ABC_1 ABC_2 ABC_3
and some like XYZ_1, XYZ_2,XYZ_3
let's say.
How can I subset my df
based only on columns containing the above portions of text (lets say, ABC
or XYZ
)? I can use indices, but the columns are too scattered in data and it becomes too much of hard coding.
Also, I want to only include rows from each of these columns where any of their value is >0
so if either of the 6
columns above has a 1
in the row, it makes a cut into my final data frame.
Using dplyr you can:
df <- df %>% dplyr:: select(grep("ABC", names(df)), grep("XYZ", names(df)))
You can also use starts_with
and dplyr
's select()
like so:
df <- df %>% dplyr:: select(starts_with("ABC"))
This worked for me:
df[,names(df) %in% colnames(df)[grepl(str,colnames(df))]]
Just in case for data.table
users, the following works for me:
df[, grep("ABC", names(df)), with = FALSE]
Source: Stackoverflow.com