Subset data to contain only columns whose names match a condition

Question

Is there a way for me to subset data based on column names starting with a particular string  I have some columns which are like ABC 1 ABC 2 ABC 3 and some like XYZ 1  XYZ 2 XYZ 3 let s say   How can I subset my df based only on columns containing the above portions of text  lets say  ABC or XYZ   I can use indices  but the columns are too scattered in data and it becomes too much of hard coding   Also  I want to only include rows from each of these columns where any of their value is  gt 0 so if either of the 6 columns above has a 1 in the row  it makes a cut into my final data frame

User · Accepted Answer

Try grepl on the names of your data frame  grepl matches a regular expression to a target and returns TRUE if a match is found and FALSE otherwise  The function is vectorised so you can pass a vector of strings to match and you will get a vector of boolean values returned  Example    Data df  lt - data frame  ABC 1   runif 3               ABC 2   runif 3               XYZ 1   runif 3               XYZ 2   runif 3            ABC 1     ABC 2     XYZ 1     XYZ 2  1 0 3792645 0 3614199 0 9793573 0 7139381  2 0 1313246 0 9746691 0 7276705 0 0126057  3 0 7282680 0 6518444 0 9531389 0 9673290     Use grepl df    grepl   quot ABC quot    names  df              ABC 1     ABC 2  1 0 3792645 0 3614199  2 0 1313246 0 9746691  3 0 7282680 0 6518444     grepl returns logical vector like this which is what we use to subset columns grepl   quot ABC quot    names  df       1   TRUE  TRUE FALSE FALSE  To answer the second part  I d make the subset data frame and then make a vector that indexes the rows to keep  a logical vector  like this    set seed 1  df  lt - data frame  ABC 1   sample 0 1 3 repl   TRUE               ABC 2   sample 0 1 3 repl   TRUE               XYZ 1   sample 0 1 3 repl   TRUE               XYZ 2   sample 0 1 3 repl   TRUE       We will want to discard the second row because  all  ABC values are 0     ABC 1 ABC 2 XYZ 1 XYZ 2  1     0     1     1     0  2     0     0     1     0  3     1     1     1     0   df1  lt - df    grepl   quot ABC quot    names  df        ind  lt - apply  df1   1   function x  any  x  gt  0      df1  ind        ABC 1 ABC 2  1     0     1  3     1     1

User · Answer

Using dplyr you can   df  lt - df   gt   dplyr   select grep  ABC   names df    grep  XYZ   names df

User · Answer

Just in case for data table users  the following works for me    df   grep  ABC   names df    with   FALSE

User · Answer

You can also use starts with and dplyr s select   like so   df  lt - df   gt   dplyr   select starts with  ABC

User · Answer

This worked for me   df  names df   in  colnames df  grepl str colnames df

[r] Subset data to contain only columns whose names match a condition

Example

Examples related to r

Examples related to subset