Combine a list of data frames into one data frame by row

Question

I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame    I got some pointers from an earlier question which was trying to do something similar but more complex    Here s an example of what I am starting with  this is grossly simplified for illustration    listOfDataFrames  lt - vector mode    list   length   100   for  i in 1 100        listOfDataFrames  i    lt - data frame a sample letters  500  rep T                                b rnorm 500   c rnorm 500       I am currently using this     df  lt - do call  rbind   listOfDataFrames

User · Answer

Code   library microbenchmark   dflist  lt - vector length 10 mode  list   for i in 1 100      dflist  i    lt - data frame a runif n 260  b runif n 260                               c rep LETTERS 10  d rep LETTERS 10       mb  lt - microbenchmark  plyr  rbind fill dflist   dplyr  bind rows dflist   data table  rbindlist dflist   plyr  ldply dflist data frame   do call  rbind  dflist   times 1000   ggplot2  autoplot mb    Session   R version 3 3 0  2016-05-03  Platform  x86 64-w64-mingw32 x64  64-bit  Running under  Windows 7 x64  build 7601  Service Pack 1   gt  packageVersion  plyr    1     1 8 4     gt  packageVersion  dplyr    1     0 5 0     gt  packageVersion  data table    1     1 9 6        UPDATE   Rerun 31-Jan-2018  Ran on the same computer  New versions of packages  Added seed for seed lovers     set seed 21  library microbenchmark   dflist  lt - vector length 10 mode  list   for i in 1 100      dflist  i    lt - data frame a runif n 260  b runif n 260                               c rep LETTERS 10  d rep LETTERS 10       mb  lt - microbenchmark    plyr  rbind fill dflist     dplyr  bind rows dflist     data table  rbindlist dflist     plyr  ldply dflist data frame     do call  rbind  dflist     times 1000   ggplot2  autoplot mb  theme bw     R version 3 4 0  2017-04-21  Platform  x86 64-w64-mingw32 x64  64-bit  Running under  Windows 7 x64  build 7601  Service Pack 1   gt  packageVersion  plyr    1     1 8 4     gt  packageVersion  dplyr    1     0 7 2     gt  packageVersion  data table    1     1 10 4        UPDATE  Rerun 06-Aug-2019     set seed 21  library microbenchmark   dflist  lt - vector length 10 mode  list   for i in 1 100      dflist  i    lt - data frame a runif n 260  b runif n 260                               c rep LETTERS 10  d rep LETTERS 10       mb  lt - microbenchmark    plyr  rbind fill dflist     dplyr  bind rows dflist     data table  rbindlist dflist     plyr  ldply dflist data frame     do call  rbind  dflist     purrr  map df dflist dplyr  bind rows     times 1000   ggplot2  autoplot mb  theme bw    R version 3 6 0  2019-04-26  Platform  x86 64-pc-linux-gnu  64-bit  Running under  Ubuntu 18 04 2 LTS  Matrix products  default BLAS     usr lib x86 64-linux-gnu openblas libblas so 3 LAPACK   usr lib x86 64-linux-gnu libopenblasp-r0 2 20 so  packageVersion  plyr   packageVersion  dplyr   packageVersion  data table   packageVersion  purrr     gt  gt  packageVersion  plyr    1     1 8 4     gt  gt  packageVersion  dplyr    1     0 8 3     gt  gt  packageVersion  data table    1     1 12 2     gt  gt  packageVersion  purrr    1     0 3 2

User · Answer

For the purpose of completeness  I thought the answers to this question required an update   My guess is that using do call  rbind        is going to be the fastest approach that you will find     It was probably true for May 2010 and some time after  but in about Sep 2011 a new function rbindlist was introduced in the data table package version 1 8 2  with a remark that  This does the same as do call  rbind  l   but much faster   How much faster    library rbenchmark  benchmark    do call   do call  rbind   listOfDataFrames     plyr rbind fill   plyr  rbind fill listOfDataFrames      plyr ldply   plyr  ldply listOfDataFrames  data frame     data table rbindlist   as data frame data table  rbindlist listOfDataFrames      replications   100  order    relative      columns c  test   replications    elapsed   relative                              test replications elapsed relative 4 data table rbindlist          100    0 11    1 000 1              do call          100    9 39   85 364 2      plyr rbind fill          100   12 08  109 818 3           plyr ldply          100   15 14  137 636

User · Answer

How it should be done in the tidyverse   df dplyr purrr  lt - listOfDataFrames   gt   map df bind rows

User · Answer

Here s another way this can be done  just adding it to the answers because reduce is a very effective functional tool that is often overlooked as a replacement for loops  In this particular case  neither of these are significantly faster than do call   using base R   df  lt - Reduce rbind  listOfDataFrames    or  using the tidyverse   library tidyverse    or  library dplyr   library purrr  df  lt - listOfDataFrames   gt   reduce bind rows

User · Answer

The only thing that the solutions with data table are missing is the identifier column to know from which dataframe in the list the data is coming from   Something like this   df id  lt - data table  rbindlist listOfDataFrames  idcol   TRUE    The  idcol parameter adds a column   id  identifying the origin of the dataframe contained in the list  The result would look to something like this    id a         b           c 1   u   -0 05315128 -1 31975849  1   b   -1 00404849 1 15257952   1   y   1 17478229  -0 91043925  1   q   -1 65488899 0 05846295   1   c   -1 43730524 0 95245909   1   b   0 56434313  0 93813197

User · Answer

There is also bind rows x       in dplyr    gt  system time   df Base  lt - do call  rbind   listOfDataFrames        user  system elapsed     0 08    0 00    0 07   gt    gt  system time   df dplyr  lt - as data frame bind rows listOfDataFrames         user  system elapsed     0 01    0 00    0 02   gt    gt  identical df Base  df dplyr   1  TRUE

User · Answer

One other option is to use a plyr function   df  lt - ldply listOfDataFrames  data frame    This is a little slower than the original    gt  system time   df  lt - do call  rbind   listOfDataFrames        user  system elapsed     0 25    0 00    0 25   gt  system time   df2  lt - ldply listOfDataFrames  data frame        user  system elapsed     0 30    0 00    0 29  gt  identical df  df2   1  TRUE   My guess is that using do call  rbind        is going to be the fastest approach that you will find unless you can do something like  a  use a matrices instead of a data frames and  b  preallocate the final matrix and assign to it rather than growing it   Edit 1   Based on Hadley s comment  here s the latest version of rbind fill from CRAN    gt  system time   df3  lt - rbind fill listOfDataFrames        user  system elapsed     0 24    0 00    0 23   gt  identical df  df3   1  TRUE   This is easier than rbind  and marginally faster  these timings hold up over multiple runs    And as far as I understand it  the version of plyr on github is even faster than this

User · Answer

Use bind rows   from the dplyr package  bind rows list of dataframes   id    quot column label quot

User · Answer

An updated visual for those wanting to compare some of the recent answers  I wanted to compare the purrr to dplyr solution   Basically I combined answers from  TheVTM and  rmf     Code   library microbenchmark  library data table  library tidyverse   dflist  lt - vector length 10 mode  list   for i in 1 100      dflist  i    lt - data frame a runif n 260  b runif n 260                               c rep LETTERS 10  d rep LETTERS 10       mb  lt - microbenchmark    dplyr  bind rows dflist     data table  rbindlist dflist     purrr  map df dflist  bind rows     do call  rbind  dflist     times 500   ggplot2  autoplot mb    Session Info   sessionInfo   R version 3 4 1  2017-06-30  Platform  x86 64-w64-mingw32 x64  64-bit  Running under  Windows 7 x64  build 7601  Service Pack 1   Package Versions    gt  packageVersion  tidyverse    1     1 1 1     gt  packageVersion  data table    1     1 10 0

[r] Combine a list of data frames into one data frame by row

Examples related to r

Examples related to list

Examples related to dataframe

Examples related to r-faq