Repeat each row of data frame the number of times specified in a column

Question

df  lt - data frame var1   c  a    b    c    var2   c  d    e    f                     freq   1 3    What is the simplest way to expand each row the first two columns of the data frame above  so that each row is repeated the number of times specified in the column  freq    In other words  go from this   df   var1 var2 freq 1    a    d    1 2    b    e    2 3    c    f    3   To this   df expanded   var1 var2 1    a    d 2    b    e 3    b    e 4    c    f 5    c    f 6    c    f

User · Answer

old question  new verb in tidyverse   library tidyr    version  gt   0 8 0 df  lt - data frame var1 c  a    b    c    var2 c  d    e    f    freq 1 3  df   gt      uncount freq       var1 var2 1      a    d 2      b    e 2 1    b    e 3      c    f 3 1    c    f 3 2    c    f

User · Answer

Another possibility is using tidyr  expand   library dplyr  library tidyr   df   gt   group by at vars -freq     gt   expand temp   1 freq    gt   select -temp      gt    A tibble  6 x 2   gt    Groups    var1  var2  3    gt    var1  var2    gt     lt fct gt   lt fct gt    gt  1 a     d       gt  2 b     e       gt  3 b     e       gt  4 c     f       gt  5 c     f       gt  6 c     f   One-liner version of vonjd s answer   library data table   setDT df    list freq rep 1 freq   by c  var1   var2      freq    NULL        gt     var1 var2   gt  1     a    d   gt  2     b    e   gt  3     b    e   gt  4     c    f   gt  5     c    f   gt  6     c    f   Created on 2019-05-21 by the reprex package  v0 2 1

User · Answer

in fact  use the methods of vector and index  we can also achieve the same result  and more easier to understand  rawdata  lt - data frame  time    1 3               x1    4 6              x2    7 9              x3    10 12   rawdata rep 1  time 2       gt   remove rownames      time x1 x2 x3   1    1  4  7 10   2    1  4  7 10

User · Answer

Here s one solution   df expanded  lt - df rep row names df   df freq   1 2    Result       var1 var2 1      a    d 2      b    e 2 1    b    e 3      c    f 3 1    c    f 3 2    c    f

User · Answer

I know this is not the case but if you need to keep the original freq column  you can use another tidyverse approach together with rep   library purrr   df  lt - data frame var1   c  a    b    c    var2   c  d    e    f    freq   1 3   df   gt      map df    rep    freq    gt    A tibble  6 x 3   gt    var1  var2   freq   gt     lt fct gt   lt fct gt   lt int gt    gt  1 a     d         1   gt  2 b     e         2   gt  3 b     e         2   gt  4 c     f         3   gt  5 c     f         3   gt  6 c     f         3   Created on 2019-12-21 by the reprex package  v0 3 0

User · Answer

Another dplyr alternative with slice where we repeat each row number freq times  library dplyr   df   gt       slice rep seq len n     freq     gt      select -freq      var1 var2  1    a    d  2    b    e  3    b    e  4    c    f  5    c    f  6    c    f   seq len n    part can be replaced with any of the following    df   gt   slice rep 1 nrow df   freq     gt   select -freq   Or df   gt   slice rep row number    freq     gt   select -freq   Or df   gt   slice rep seq len nrow      freq     gt   select -freq

User · Answer

neilfws s solution works great for data frames  but not for data tables since they lack the row names property  This approach works for both   df expanded  lt - df rep seq nrow df    df freq   1 2    The code for data table is a tad cleaner     convert to data table by reference setDT df  df expanded  lt - df rep seq  N   freq     freq

User · Answer

In case you have to do this operation on very large data frames I would recommend converting it into a data table and use the following  which should run much faster   library data table  dt  lt - data table df  dt expanded  lt - dt   list freq rep 1 freq   by c  var1   var2    dt expanded   freq    NULL  dt expanded   See how much faster this solution is   df  lt - data frame var1 1 2e3  var2 1 2e3  freq 1 2e3  system time df exp  lt - df rep row names df   df freq   1 2         user  system elapsed        4 57    0 00    4 56 dt  lt - data table df  system time dt expanded  lt - dt   list freq rep 1 freq   by c  var1   var2           user  system elapsed        0 05    0 01    0 06

User · Answer

Use expandRows   from the splitstackshape package   library splitstackshape  expandRows df   freq     Simple syntax  very fast  works on data frame or data table   Result       var1 var2 1      a    d 2      b    e 2 1    b    e 3      c    f 3 1    c    f 3 2    c    f

[r] Repeat each row of data.frame the number of times specified in a column

Examples related to r

Examples related to dataframe

Examples related to replicate