Subset dataframe by multiple logical conditions of rows to remove

Question

I would like to subset  filter  a dataframe by specifying which rows not     to keep in the new dataframe   Here is a simplified sample dataframe   data v1 v2 v3 v4 a  v  d  c a  v  d  d b  n  p  g b  d  d  h     c  k  d  c     c  r  p  g d  v  d  x d  v  d  c e  v  d  b e  v  d  c   For example  if a row of column v1 has a  b    d   or  e   I want to get rid of that row of observations  producing the following dataframe   v1 v2 v3 v4 a  v  d  c a  v  d  d c  k  d  c     c  r  p  g   I have been successful at subsetting based on one condition at a time  For example  here I remove rows where v1 contains a  b    sub data  lt - data data    1      b       However  I have many  many such conditions  so doing it one at a time is not desirable  I have not been successful with the following   sub data  lt - data data    1     c  b    d    e     or  sub data  lt - subset data  data    1     c  b    d    e      I ve tried some other things as well  like   in   but that doesn t seem to exist  Any ideas

User · Answer

This answer is more meant to explain why, not how. The '==' operator in R is vectorized in a same way as the '+' operator. It matches the elements of whatever is on the left side to the elements of whatever is on the right side, per element. For example:

> 1:3 == 1:3
[1] TRUE TRUE TRUE

Here the first test is 1==1 which is TRUE, the second 2==2 and the third 3==3. Notice that this returns a FALSE in the first and second element because the order is wrong:

> 3:1 == 1:3
[1] FALSE  TRUE FALSE

Now if one object is smaller then the other object then the smaller object is repeated as much as it takes to match the larger object. If the size of the larger object is not a multiplication of the size of the smaller object you get a warning that not all elements are repeated. For example:

>  1:2 == 1:3
[1]  TRUE  TRUE FALSE
Warning message:
In 1:2 == 1:3 :
  longer object length is not a multiple of shorter object length

Here the first match is 1==1, then 2==2, and finally 1==3 (FALSE) because the left side is smaller. If one of the sides is only one element then that is repeated:

> 1:3 == 1
[1]  TRUE FALSE FALSE

The correct operator to test if an element is in a vector is indeed '%in%' which is vectorized only to the left element (for each element in the left vector it is tested if it is part of any object in the right element).

Alternatively, you can use '&' to combine two logical statements. '&' takes two elements and checks elementwise if both are TRUE:

> 1:3 == 1 & 1:3 != 2
[1]  TRUE FALSE FALSE

User · Answer

You can also accomplish this by breaking things up into separate logical statements by including  amp  to separate the statements   subset my df  my df v1     b   amp  my df v1     d   amp  my df v1     e     This is not elegant and takes more code but might be more readable to newer R users   As pointed out in a comment above  subset is a  convenience  function that is best used when working interactively

User · Answer

data  lt - data -which data  1   in  c  b   d   e

User · Answer

And also  library dplyr  data   gt   filter  v1  in  c  b    d    e      or  data   gt   filter v1     b   amp  v1     d   amp  v1     e     or  data   gt   filter v1     b   v1     d   v1     e     Since the  amp  operator is implied by the comma

User · Answer

Try this  subset data    v1  in  c  b   d   e

User · Answer

The   should be around the outside of the statement   data   data v1  in  c  b    d    e          v1 v2 v3 v4 1  a  v  d  c 2  a  v  d  d 5  c  k  d  c 6  c  r  p  g

User · Answer

sub data lt -data  data  1      b    amp  data  1      d   amp  data  1      e        Larger but simple to understand  I guess  and can be used with multiple columns  even with  is na  data  1

User · Answer

my df  lt - read table textConnection   v1 v2 v3 v4 a  v  d  c a  v  d  d b  n  p  g b  d  d  h     c  k  d  c     c  r  p  g d  v  d  x d  v  d  c e  v  d  b e  v  d  c    header   TRUE   my df which my df v1     b   amp  my df v1     d   amp  my df v1     e          v1 v2 v3 v4 1  a  v  d  c 2  a  v  d  d 5  c  k  d  c 6  c  r  p  g

[r] Subset dataframe by multiple logical conditions of rows to remove

Examples related to r

Examples related to dataframe

Examples related to subset