Most of the time, the best solution is using distinct()
from dplyr, as has already been suggested.
However, here's another approach that uses the slice()
function from dplyr.
# Generate fake data for the example
library(dplyr)
set.seed(123)
df <- data.frame(
x = sample(0:1, 10, replace = T),
y = sample(0:1, 10, replace = T),
z = 1:10
)
# In each group of rows formed by combinations of x and y
# retain only the first row
df %>%
group_by(x, y) %>%
slice(1)
distinct()
functionThe advantage of this solution is that it makes it explicit which rows are retained from the original dataframe, and it can pair nicely with the arrange()
function.
Let's say you had customer sales data and you wanted to retain one record per customer, and you want that record to be the one from their latest purchase. Then you could write:
customer_purchase_data %>%
arrange(desc(Purchase_Date)) %>%
group_by(Customer_ID) %>%
slice(1)