How can I subset rows in a data frame in R based on a vector of values

Question

I have two data sets that are supposed to be the same size but aren t  I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that s going into a report   Don t worry  this data isn t being permanently deleted    I have read the following    Selecting columns in R data frame based on those  not  in a vector http   www ats ucla edu stat r faq subset R htm How to combine multiple conditions to subset a data-frame using  quot OR quot     But I m still not able to get this to work right  Here s my code   bg2011missingFromBeg  lt - setdiff x eg2011 ID  y bg2011 ID   attempt 1 eg2011cleaned  lt - subset eg2011  ID    bg2011missingFromBeg   attempt 2 eg2011cleaned  lt - eg2011  eg2011 ID  in  bg2011missingFromBeg    The first try just eliminates the first value in the resulting setdiff vector  The second try yields and unwieldy error   Error in    data frame  eg2012   eg2012 ID  in  bg2012missingFromBeg      undefined columns selected

User · Answer

Per the comments to the original post  merges   joins are well-suited for this problem   In particular  an inner join will return only values that are present in both dataframes  making thesetdiff statement unnecessary   Using the data from Dinre s example   In base R   cleanedA  lt - merge data A  data B    index    by   1  sort   FALSE  cleanedB  lt - merge data B  data A    index    by   1  sort   FALSE    Using the dplyr package   library dplyr  cleanedA  lt - inner join data A  data B   gt   select index   cleanedB  lt - inner join data B  data A   gt   select index     To keep the data as two separate tables  each containing only its own variables  this subsets the unwanted table to only its index variable before joining   Then no new variables are added to the resulting table

User · Answer

If you really just want to subset each data frame by an index that exists in both data frames  you can do this with the  match  function  like so   data A match data B index  data A index  nomatch 0    data B match data A index  data B index  nomatch 0      This is  though  the same as   data A data A index  in  data B index   data B data B index  in  data A index     Here is a demo     Set seed for reproducibility  set seed 1     Create two sample data sets  data A  lt - data frame index sample 1 200  90  rep FALSE   value runif 90   data B  lt - data frame index sample 1 200  120  rep FALSE   value runif 120      Subset data of each data frame by the index in the other  t A  lt - data A match data B index  data A index  nomatch 0    t B  lt - data B match data A index  data B index  nomatch 0       Make sure they match  data frame t A order t A index     t B order t B index     1 20         index     value index 1    value 1   27     3 0 7155661       3 0 65887761   10    12 0 6049333      12 0 14362694   88    14 0 7410786      14 0 42021589   56    15 0 4525708      15 0 78101754   38    18 0 2075451      18 0 70277874   24    23 0 4314737      23 0 78218212   34    32 0 1734423      32 0 85508236   22    38 0 7317925      38 0 56426384   84    39 0 3913593      39 0 09485786   5     40 0 7789147      40 0 31248966   74    43 0 7799849      43 0 10910096   71    45 0 2847905      45 0 26787813   57    46 0 1751268      46 0 17719454   25    48 0 1482116      48 0 99607737   81    53 0 6304141      53 0 26721208   60    58 0 8645449      58 0 96920881   30    59 0 6401010      59 0 67371223   75    61 0 8806190      61 0 69882454   63    64 0 3287773      64 0 36918946   19    70 0 9240745      70 0 11350771

User · Answer

This will give you what you want   eg2011cleaned  lt - eg2011  eg2011 ID  in  bg2011missingFromBeg      The error in your second attempt is because you forgot the    In general  for convenience  the specification object index  subsets columns for a 2d object  If you want to subset rows and keep all columns you have to use the specification object index rows  index columns   while index cols can be left blank  which will use all columns by default    However  you still need to include the   to indicate that you want to get a subset of rows instead of a subset of columns

User · Answer

Really human comprehensible example  as this is the first time I am using  in    how to compare two data frames and keep only rows containing the equal values in specific column      Set seed for reproducibility  set seed 1     Create two sample data frames  data A  lt - data frame id c 1 2 3   value c 1 2 3   data B  lt - data frame id c 1 2 3 4   value c 5 6 7 8      compare data frames by specific columns and keep only    the rows with equal values  data A data A id  in  data B id       will keep data in data A data B data B id  in  data A id       will keep data in data b   Results    gt  data A data A id  in  data B id     id value 1  1     1 2  2     2 3  3     3   gt  data B data B id  in  data A id     id value 1  1     5 2  2     6 3  3     7

[r] How can I subset rows in a data frame in R based on a vector of values?

Examples related to r

Examples related to subset

Examples related to r-faq