[r] Select the first row by group

I favor the dplyr approach.

group_by(id) followed by either

  • filter(row_number()==1) or
  • slice(1) or
  • slice_head(1) #(dplyr => 1.0)
  • top_n(n = -1)
    • top_n() internally uses the rank function. Negative selects from the bottom of rank.

In some instances arranging the ids after the group_by can be necessary.

library(dplyr)

# using filter(), top_n() or slice()

m1 <-
test %>% 
  group_by(id) %>% 
  filter(row_number()==1)

m2 <-
test %>% 
  group_by(id) %>% 
  slice(1)

m3 <-
test %>% 
  group_by(id) %>% 
  top_n(n = -1)

All three methods return the same result

# A tibble: 5 x 2
# Groups:   id [5]
     id string
  <int> <fct> 
1     1 A     
2     2 B     
3     3 C     
4     4 D     
5     5 E