[r] Remove duplicated rows using dplyr

Most of the time, the best solution is using distinct() from dplyr, as has already been suggested.

However, here's another approach that uses the slice() function from dplyr.

# Generate fake data for the example
  library(dplyr)
  set.seed(123)
  df <- data.frame(
    x = sample(0:1, 10, replace = T),
    y = sample(0:1, 10, replace = T),
    z = 1:10
  )

# In each group of rows formed by combinations of x and y
# retain only the first row

    df %>%
      group_by(x, y) %>%
      slice(1)

Difference from using the distinct() function

The advantage of this solution is that it makes it explicit which rows are retained from the original dataframe, and it can pair nicely with the arrange() function.

Let's say you had customer sales data and you wanted to retain one record per customer, and you want that record to be the one from their latest purchase. Then you could write:

customer_purchase_data %>%
   arrange(desc(Purchase_Date)) %>%
   group_by(Customer_ID) %>%
   slice(1)