Filter rows which contain a certain string

Question

I have to filter a data frame using as criterion those row in which is contained the string RTB  I m using dplyr  d del  lt - df   gt     group by TrackingPixel    gt     summarise MonthDelivery   as integer sum Revenue      gt     arrange desc MonthDelivery    I know I can use the function filter in dplyr but I don t exactly how to tell it to check for the content of a string  In particular I want to check the content in the column TrackingPixel  If the string contains the label RTB I want to remove the row from the result

User · Answer

The answer to the question was already posted by the @latemail in the comments above. You can use regular expressions for the second and subsequent arguments of filter like this:

dplyr::filter(df, !grepl("RTB",TrackingPixel))

Since you have not provided the original data, I will add a toy example using the mtcars data set. Imagine you are only interested in cars produced by Mazda or Toyota.

mtcars$type <- rownames(mtcars)
dplyr::filter(mtcars, grepl('Toyota|Mazda', type))

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb           type
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      Mazda RX4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda RX4 Wag
3 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota Corolla
4 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  Toyota Corona

If you would like to do it the other way round, namely excluding Toyota and Mazda cars, the filter command looks like this:

dplyr::filter(mtcars, !grepl('Toyota|Mazda', type))

User · Answer

edit included the newer across   syntax Here s another tidyverse solution  using filter across    or previously filter at  The advantage is that you can easily extend to more than one column  Below also a solution with filter all in order to find the string in any column  using diamonds as example  looking for the string  quot V quot  library tidyverse   String in only one column   for only one column    extendable to more than one creating a column list in  across  or  vars   mtcars   gt      rownames to column  quot type quot     gt      filter across type     grepl  Toyota Mazda          gt     head     gt                 type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb   gt  1        Datsun 710 22 8   4 108 0  93 3 85 2 320 18 61  1  1    4    1   gt  2    Hornet 4 Drive 21 4   6 258 0 110 3 08 3 215 19 44  1  0    3    1   gt  3 Hornet Sportabout 18 7   8 360 0 175 3 15 3 440 17 02  0  0    3    2   gt  4           Valiant 18 1   6 225 0 105 2 76 3 460 20 22  1  0    3    1   gt  5        Duster 360 14 3   8 360 0 245 3 21 3 570 15 84  0  0    3    4   gt  6         Merc 240D 24 4   4 146 7  62 3 69 3 190 20 00  1  0    4    2  The now superseded syntax for the same would be  mtcars   gt      rownames to column  quot type quot     gt      filter at  vars  vars type   all vars  grepl  Toyota Mazda         String in all columns    remove all rows where any column contains  V  diamonds   gt     filter across everything       grepl  V          gt     head   gt    A tibble  6 x 10   gt    carat cut     color clarity depth table price     x     y     z   gt     lt dbl gt   lt ord gt     lt ord gt   lt ord gt     lt dbl gt   lt dbl gt   lt int gt   lt dbl gt   lt dbl gt   lt dbl gt    gt  1  0 23 Ideal   E     SI2      61 5    55   326  3 95  3 98  2 43   gt  2  0 21 Premium E     SI1      59 8    61   326  3 89  3 84  2 31   gt  3  0 31 Good    J     SI2      63 3    58   335  4 34  4 35  2 75   gt  4  0 3  Good    J     SI1      64      55   339  4 25  4 28  2 73   gt  5  0 22 Premium F     SI1      60 4    61   342  3 88  3 84  2 33   gt  6  0 31 Ideal   J     SI2      62 2    54   344  4 35  4 37  2 71  The now superseded syntax for the same would be  diamonds   gt      filter all all vars  grepl  V          gt     head  I tried to find an across alternative for the following  but I didn t immediately come up with a good solution       get all rows where any column contains  V      diamonds   gt       filter all any vars grepl  V         gt         head       gt    A tibble  6 x 10       gt    carat cut       color clarity depth table price     x     y     z       gt     lt dbl gt   lt ord gt       lt ord gt   lt ord gt     lt dbl gt   lt dbl gt   lt int gt   lt dbl gt   lt dbl gt   lt dbl gt        gt  1 0 23  Good      E     VS1      56 9    65   327  4 05  4 07  2 31       gt  2 0 290 Premium   I     VS2      62 4    58   334  4 2   4 23  2 63       gt  3 0 24  Very Good J     VVS2     62 8    57   336  3 94  3 96  2 48       gt  4 0 24  Very Good I     VVS1     62 3    57   336  3 95  3 98  2 47       gt  5 0 26  Very Good H     SI1      61 9    55   337  4 07  4 11  2 53       gt  6 0 22  Fair      E     VS2      65 1    61   337  3 87  3 78  2 49  Update  Thanks to user Petr Kajzar in this answer  here also an approach for the above  diamonds   gt      filter rowSums across everything     grepl  quot V quot    x     gt  0

User · Answer

Solution  It is possible to use str detect of the stringr package included in the tidyverse package  str detect returns True or False as to whether the specified vector contains some specific string  It is possible to filter using this boolean value  See Introduction to stringr for details about stringr package   library tidyverse    - Attaching packages -------------------- tidyverse 1 2 1 -     ggplot2 2 2 1       purrr   0 2 4     tibble  1 4 2       dplyr   0 7 4     tidyr   0 7 2       stringr 1 2 0     readr   1 1 1       forcats 0 3 0   - Conflicts --------------------- tidyverse conflicts   -     dplyr  filter   masks stats  filter       dplyr  lag      masks stats  lag    mtcars type  lt - rownames mtcars  mtcars   gt     filter str detect type   Toyota Mazda      mpg cyl  disp  hp drat    wt  qsec vs am gear carb           type   1 21 0   6 160 0 110 3 90 2 620 16 46  0  1    4    4      Mazda RX4   2 21 0   6 160 0 110 3 90 2 875 17 02  0  1    4    4  Mazda RX4 Wag   3 33 9   4  71 1  65 4 22 1 835 19 90  1  1    4    1 Toyota Corolla   4 21 5   4 120 1  97 3 70 2 465 20 01  1  0    3    1  Toyota Corona   The good things about Stringr  We should use rather stringr  str detect   than base  grepl    This is because there are the following reasons    The functions provided by the stringr package start with the prefix str   which makes the code easier to read  The first argument of the functions of stringr package is always the data frame  or value   then comes the parameters  Thank you Paolo    object  lt -  stringr    The functions with the same prefix  str      The first argument is an object  stringr  str count object    - gt  7 stringr  str sub object  1  3    - gt   str  stringr  str detect object   str     - gt  TRUE stringr  str replace object   str         - gt   ingr    The function names without common points    The position of the argument of the object also does not match  base  nchar object    - gt  7 base  substr object  1  3    - gt   str  base  grepl  str   object    - gt  TRUE base  sub  str       object    - gt   ingr    Benchmark  The results of the benchmark test are as follows  For large dataframe  str detect is faster   library rbenchmark  library tidyverse     The data  Data expo 09  ASA Statistics Computing and Graphics    http   stat-computing org dataexpo 2009 the-data html df  lt - read csv  Downloads 2008 csv   print dim df      1  7009728      29  benchmark     str detect     df   gt   filter str detect Dest   MCO BWI         grepl     df   gt   filter grepl  MCO BWI   Dest       replications   10    columns   c  test    replications    elapsed    relative    user self    sys self      test replications elapsed relative user self sys self   2      grepl           10  16 480    1 513    16 195    0 248   1 str detect           10  10 891    1 000     9 594    1 281

User · Answer

This answer similar to others  but using preferred stringr  str detect and dplyr rownames to column   library tidyverse   mtcars   gt      rownames to column  type     gt      filter stringr  str detect type   Toyota Mazda        gt              type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb   gt  1      Mazda RX4 21 0   6 160 0 110 3 90 2 620 16 46  0  1    4    4   gt  2  Mazda RX4 Wag 21 0   6 160 0 110 3 90 2 875 17 02  0  1    4    4   gt  3 Toyota Corolla 33 9   4  71 1  65 4 22 1 835 19 90  1  1    4    1   gt  4  Toyota Corona 21 5   4 120 1  97 3 70 2 465 20 01  1  0    3    1   Created on 2018-06-26 by the reprex package  v0 2 0

[r] Filter rows which contain a certain string

Examples related to r

Examples related to filter

Examples related to dplyr