Numbering rows within groups in a data frame

Question

Working with a data frame similar to this     set seed 100    df  lt - data frame cat   c rep  aaa   5   rep  bbb   5   rep  ccc   5    val   runif 15                df  lt - df order df cat  df val       df       cat        val   1  aaa 0 05638315   2  aaa 0 25767250   3  aaa 0 30776611   4  aaa 0 46854928   5  aaa 0 55232243   6  bbb 0 17026205   7  bbb 0 37032054   8  bbb 0 48377074   9  bbb 0 54655860   10 bbb 0 81240262   11 ccc 0 28035384   12 ccc 0 39848790   13 ccc 0 62499648   14 ccc 0 76255108   15 ccc 0 88216552    I am trying to add a column with numbering within each group  Doing it this way obviously isn t using the powers of R      df num  lt - 1    for  i in 2  length df  1            if  df i  cat    df  i-1   cat             df i  num   lt -df i-1  num   1                  df       cat        val num   1  aaa 0 05638315   1   2  aaa 0 25767250   2   3  aaa 0 30776611   3   4  aaa 0 46854928   4   5  aaa 0 55232243   5   6  bbb 0 17026205   1   7  bbb 0 37032054   2   8  bbb 0 48377074   3   9  bbb 0 54655860   4   10 bbb 0 81240262   5   11 ccc 0 28035384   1   12 ccc 0 39848790   2   13 ccc 0 62499648   3   14 ccc 0 76255108   4   15 ccc 0 88216552   5     What would be a good way to do this

User · Answer

For making this r-faq question more complete, a base R alternative with sequence and rle:

df$num <- sequence(rle(df$cat)$lengths)

which gives the intended result:

> df
   cat        val num
4  aaa 0.05638315   1
2  aaa 0.25767250   2
1  aaa 0.30776611   3
5  aaa 0.46854928   4
3  aaa 0.55232243   5
10 bbb 0.17026205   1
8  bbb 0.37032054   2
6  bbb 0.48377074   3
9  bbb 0.54655860   4
7  bbb 0.81240262   5
13 ccc 0.28035384   1
14 ccc 0.39848790   2
11 ccc 0.62499648   3
15 ccc 0.76255108   4
12 ccc 0.88216552   5

If df$cat is a factor variable, you need to wrap it in as.character first:

df$num <- sequence(rle(as.character(df$cat))$lengths)

User · Answer

Here is a small improvement trick that allows sort  val  inside the groups     1  Data set set seed 100  df  lt - data frame    cat   c rep  aaa   5   rep  ccc   5   rep  bbb   5       val   runif 15                   2   dplyr  approach df   gt      arrange cat  val    gt      group by cat    gt      mutate id   row number

User · Answer

Another base R solution would be to split the data frame per cat  after that using lapply  add a column with number 1 nrow x   The last step is to have your final data frame back with do call  that is          df split  lt - split df  df cat          df lapply  lt - lapply df split  function x              x num  lt - seq len nrow x             return x                     df  lt - do call rbind  df lapply

User · Answer

I would like to add a data table variant using the rank   function which provides the additional possibility to change the ordering and thus makes it a bit more flexible than the seq len   solution and is pretty similar to row number functions in RDBMS      Variant with ascending ordering library data table  dt  lt - data table df  dt      val      num   rank val         by   list cat   order cat  num         cat        val num  1  aaa 0 05638315   1  2  aaa 0 25767250   2  3  aaa 0 30776611   3  4  aaa 0 46854928   4  5  aaa 0 55232243   5  6  bbb 0 17026205   1  7  bbb 0 37032054   2  8  bbb 0 48377074   3  9  bbb 0 54655860   4 10  bbb 0 81240262   5 11  ccc 0 28035384   1 12  ccc 0 39848790   2 13  ccc 0 62499648   3 14  ccc 0 76255108   4    Variant with descending ordering dt      val      num   rank -val         by   list cat   order cat  num

User · Answer

Use ave  ddply  dplyr or data table   df num  lt - ave df val  df cat  FUN   seq along    or   library plyr  ddply df    cat   mutate  id   seq along val     or   library dplyr  df   gt   group by cat    gt   mutate id   row number      or  the most memory efficient  as it assigns by reference within DT    library data table  DT  lt - data table df   DT   id    seq len  N   by   cat  DT   id    rowid cat

User · Answer

Another dplyr possibility could be   df   gt    group by cat    gt    mutate num   1 n        cat      val   num     lt fct gt    lt dbl gt   lt int gt   1 aaa   0 0564     1  2 aaa   0 258      2  3 aaa   0 308      3  4 aaa   0 469      4  5 aaa   0 552      5  6 bbb   0 170      1  7 bbb   0 370      2  8 bbb   0 484      3  9 bbb   0 547      4 10 bbb   0 812      5 11 ccc   0 280      1 12 ccc   0 398      2 13 ccc   0 625      3 14 ccc   0 763      4 15 ccc   0 882      5

User · Answer

Here is an option using a for loop by groups rather by rows  like OP did   for  i in unique df cat   df num df cat    i   lt - seq len sum df cat    i

User · Answer

Using the rowid   function in data table    gt  set seed 100     gt  df  lt - data frame cat   c rep  aaa   5   rep  bbb   5   rep  ccc   5    val   runif 15    gt  df  lt - df order df cat  df val        gt  df num  lt - data table  rowid df cat   gt  df    cat        val num 4  aaa 0 05638315   1 2  aaa 0 25767250   2 1  aaa 0 30776611   3 5  aaa 0 46854928   4 3  aaa 0 55232243   5 10 bbb 0 17026205   1 8  bbb 0 37032054   2 6  bbb 0 48377074   3 9  bbb 0 54655860   4 7  bbb 0 81240262   5 13 ccc 0 28035384   1 14 ccc 0 39848790   2 11 ccc 0 62499648   3 15 ccc 0 76255108   4 12 ccc 0 88216552   5

[r] Numbering rows within groups in a data frame

Examples related to r

Examples related to dataframe

Examples related to r-faq