Unique on a dataframe with only selected columns

Question

I have a dataframe with  100 columns  and I would to find the unique rows  by comparing only two of the columns  I m hoping this is an easy one  but I can t get it working with unique or duplicated myself   In the below  I would like to unique only using id and id2   data frame id c 1 1 3  id2 c 1 1 4  somevalue c  x   y   z     id id2 somevalue 1   1         x 1   1         y 3   4         z   I would like to obtain either   id id2 somevalue 1   1         x 3   4         z   or   id id2 somevalue 1   1         y 3   4         z    I have no preference which of the unique rows is kept

User · Answer

Minor update in  Joran s code  Using the code below  you can avoid the ambiguity and only get the unique of two columns   dat  lt - data frame id c 1 1 3   id2 c 1 1 4   somevalue c  x   y   z        dat row names unique dat  c  id    id2       c  id    id2

User · Answer

Using unique     dat  lt - data frame id c 1 1 3  id2 c 1 1 4  somevalue c  x   y   z        dat row names unique dat  c  id    id2

User · Answer

Ok  if it doesn t matter which value in the non-duplicated column you select  this should be pretty easy   dat  lt - data frame id c 1 1 3  id2 c 1 1 4  somevalue c  x   y   z     gt  dat  duplicated dat  c  id   id2         id id2 somevalue 1  1   1         x 3  3   4         z   Inside the duplicated call  I m simply passing only those columns from dat that I don t want duplicates of  This code will automatically always select the first of any ambiguous values   In this case  x

User · Answer

Here are a couple dplyr options that keep non-duplicate rows based on columns id and id2   library dplyr                                          df   gt   distinct id  id2   keep all   TRUE  df   gt   group by id  id2    gt   filter row number      1  df   gt   group by id  id2    gt   slice 1

[r] Unique on a dataframe with only selected columns

Examples related to r

Examples related to unique