Finding rows containing a value or values in any column

Question

Say we have a table  data  containing Strings in several columns  We want to find the indices of all rows that contain a certain value  or better yet  one of several values  The column  however  is unknown   What I do  at the moment  is   apply df  2  function x  which x     M017      where df     1 04 10 2009 01 24 51   M017   lt NA gt    lt NA gt     NA 2 04 10 2009 01 24 53   M018   lt NA gt    lt NA gt     NA 3 04 10 2009 01 24 54   M051   lt NA gt    lt NA gt     NA 4 04 10 2009 01 25 06    lt NA gt   M016   lt NA gt     NA 5 04 10 2009 01 25 07    lt NA gt   M015   lt NA gt     NA 6 04 10 2009 01 26 07    lt NA gt   M017   lt NA gt     NA 7 04 10 2009 01 26 27    lt NA gt   M017   lt NA gt     NA 8 04 10 2009 01 27 23    lt NA gt   M017   lt NA gt     NA 9 04 10 2009 01 27 30    lt NA gt   M017   lt NA gt     NA 10 04 10 2009 01 27 32   M017   lt NA gt    lt NA gt     NA 11 04 10 2009 01 27 34   M051   lt NA gt    lt NA gt     NA   This also works if we try to find more than one value   apply df  2  function x  which x  in  c  M017    M018       The result being     1  integer 0     2   1   1  2 20    3   1  16 17 18 19    4  integer 0     5  integer 0    However  processing the resulting list of lists is rather tedious    Is there a more efficient way to find rows that contain a value  or more  in ANY column

User · Answer

If you want to find the rows that have any of the values in a vector  one option is to loop the vector  lapply v1       create a logical index of  TRUE FALSE  with        Use Reduce and OR     to reduce the list to a single logical matrix by checking the corresponding elements   Sum the rows  rowSums   double negate      to get the rows with any matches   indx1  lt -   rowSums Reduce      lapply v1        df    na rm TRUE    Or vectorise and get the row indices with which with arr ind TRUE  indx2  lt - unique which Vectorize function x  x  in  v1  df                                        arr ind TRUE   1     Benchmarks  I didn t use  kristang s solution as it is giving me errors   Based on a 1000x500 matrix   konvas s solution is the most efficient  so far    But  this may vary if the number of rows are increased  val  lt - paste0  M0   1 1000  set seed 24  df1  lt - as data frame matrix sample c val  NA   1000 500     replace TRUE   ncol 500   stringsAsFactors FALSE   set seed 356  v1  lt - sample val  200  replace FALSE    konvas  lt - function    apply df1  1  function r  any r  in  v1     akrun1  lt - function      rowSums Reduce      lapply v1        df1                   na rm TRUE    akrun2  lt - function    unique which Vectorize function x  x  in                 v1  df1  arr ind TRUE   1       library microbenchmark   microbenchmark konvas    akrun1    akrun2    unit  relative   times 20L    Unit  relative      expr       min         lq       mean     median         uq      max   neval    konvas     1 00000   1 000000   1 000000   1 000000   1 000000  1 00000    20    akrun1   160 08749 147 642721 125 085200 134 491722 151 454441 52 22737    20    akrun2     5 85611   5 641451   4 676836   5 330067   5 269937  2 22255    20    cld     a      b     a    For ncol   10  the results are slighjtly different   expr       min        lq     mean    median        uq       max    neval  konvas    3 116722  3 081584  2 90660  2 983618  2 998343  2 394908    20  akrun1   27 587827 26 554422 22 91664 23 628950 21 892466 18 305376    20  akrun2    1 000000  1 000000  1 00000  1 000000  1 000000  1 000000    20   data   v1  lt - c  M017    M018    df  lt - structure list datetime   c  04 10 2009 01 24 51    04 10 2009 01 24 53     04 10 2009 01 24 54    04 10 2009 01 25 06    04 10 2009 01 25 07     04 10 2009 01 26 07    04 10 2009 01 26 27    04 10 2009 01 27 23     04 10 2009 01 27 30    04 10 2009 01 27 32    04 10 2009 01 27 34     col1   c  M017    M018    M051     lt NA gt      lt NA gt      lt NA gt      lt NA gt       lt NA gt      lt NA gt     M017    M051    col2   c   lt NA gt      lt NA gt      lt NA gt      M016    M015    M017    M017    M017    M017     lt NA gt      lt NA gt      col3   c   lt NA gt      lt NA gt      lt NA gt      lt NA gt      lt NA gt      lt NA gt      lt NA gt       lt NA gt      lt NA gt      lt NA gt      lt NA gt     col4   c NA  NA  NA  NA  NA   NA  NA  NA  NA  NA  NA     Names   c  datetime    col1    col2     col3    col4    class    data frame   row names   c  1    2     3    4    5    6    7    8    9    10    11

User · Answer

Here s a dplyr option     library dplyr     across all columns  df   gt   filter all any vars    in  c  M017    M018        or in only select columns  df   gt   filter at vars col1  col2   any vars    in  c  M017    M018

User · Answer

How about   apply df  1  function r  any r  in  c  M017    M018       The ith element will be TRUE if the ith row contains one of the values  and FALSE otherwise  Or  if you want just the row numbers  enclose the above statement in which

[r] Finding rows containing a value (or values) in any column

Benchmarks

data

Examples related to r