Simultaneously merge multiple data frames in a list

Question

I have a list of many data frames that I want to merge  The issue here is that each data frame differs in terms of the number of rows and columns  but they all share the key variables  which I ve called  var1  and  var2  in the code below   If the data frames were identical in terms of columns  I could merely rbind  for which plyr s rbind fill would do the job  but that s not the case with these data    Because the merge command only works on 2 data frames  I turned to the Internet for ideas  I got this one from here  which worked perfectly in R 2 7 2  which is what I had at the time   merge rec  lt - function  list            if length  list   1  return  list  1        Recall c list merge  list  1     list  2            list - 1 2              And I would call the function like so   df  lt - merge rec my list  by x   c  var1    var2                     by y   c  var1    var2    all   T  suffixes c            But in any R version after 2 7 2  including 2 11 and 2 12  this code fails with the following error   Error in match names clabs  names xi        names do not match previous names    Incidently  I see other references to this error elsewhere with no resolution    Is there any way to solve this

User · Answer

Here is a generic wrapper which can be used to convert a binary function to multi-parameters function. The benefit of this solution is that it is very generic and can be applied to any binary functions. You just need to do it once and then you can apply it any where.

To demo the idea, I use simple recursion to implement. It can be of course implemented with more elegant way that benefits from R's good support for functional paradigm.

fold_left <- function(f) {
return(function(...) {
    args <- list(...)
    return(function(...){
    iter <- function(result,rest) {
        if (length(rest) == 0) {
            return(result)
        } else {
            return(iter(f(result, rest[[1]], ...), rest[-1]))
        }
    }
    return(iter(args[[1]], args[-1]))
    })
})}

Then you can simply wrap any binary functions with it and call with positional parameters (usually data.frames) in the first parentheses and named parameters in the second parentheses (such as by = or suffix =). If no named parameters, leave second parentheses empty.

merge_all <- fold_left(merge)
merge_all(df1, df2, df3, df4, df5)(by.x = c("var1", "var2"), by.y = c("var1", "var2"))

left_join_all <- fold_left(left_join)
left_join_all(df1, df2, df3, df4, df5)(c("var1", "var2"))
left_join_all(df1, df2, df3, df4, df5)()

User · Answer

I had a list of dataframes with no common id column  I had missing data on many dfs  There were Null values   The dataframes were produced using table function   The Reduce  Merging  rbind  rbind fill  and their like could not help me to my aim   My aim was to produce an understandable merged dataframe  irrelevant of the missing data and common id column    Therefore  I made the following function  Maybe this function can help someone                                                                                Dependencies                                                                                            Depends on Base R only                                                                              Example DF                                                                                              Example df ex df            lt - cbind c  seq 1  10  1   rep  NA   0   seq 1 10  1                               c  seq 1  7  1    rep  NA   3   seq 1  12  1                               c  seq 1  3  1    rep  NA   7   seq 1  5  1   rep  NA   5        Making colnames and rownames colnames ex df   lt - 1 dim ex df  2  rownames ex df   lt - 1 dim ex df  1     Making an unequal list of dfs     without a common id column list of df       lt - apply ex df   NA   2    table      it is following the function                                                                              The function                                                                                             The function to rbind it rbind null df lists  lt - function   list of dfs       length df      lt - do call rbind   lapply  list of dfs  function x  length x       max no         lt - max length df  1     max df         lt - length df max length df      name df        lt - names length df length df   max no   1     names list     lt - names list of dfs  name df   1       df dfs  lt - list     for  i in 1 max no          df dfs  i               lt - do call rbind  lapply 1 length list of dfs   function x  list of dfs  x   i            df cbind                lt - do call  cbind  df dfs     rownames  df cbind      lt - rownames  length df    colnames  df cbind      lt - names list    df cbind      Running the example                                                                              Running the example                                                                                   rbind null df lists   list of df

User · Answer

Reduce makes this fairly easy   merged data frame   Reduce function      merge      all T   list of data frames    Here s a fully example using some mock data   set seed 1  list of data frames   list data frame x 1 10  a 1 10   data frame x 5 14  b 11 20   data frame x sample 20  10   y runif 10    merged data frame   Reduce function      merge      all T   list of data frames  tail merged data frame       x  a  b         y  12 12 NA 18        NA  13 13 NA 19        NA  14 14 NA 20 0 4976992  15 15 NA NA 0 7176185  16 16 NA NA 0 3841037  17 19 NA NA 0 3800352   And here s an example using these data to replicate my list   merged data frame   Reduce function      merge      by match by  all T   my list  merged data frame   1 12      matchname party st district chamber senate1993 name x v2 x v3 x v4 x senate1994 name y  1   ALGIERE   200 RI      026       S         NA    lt NA gt    NA   NA   NA         NA    lt NA gt   2     ALVES   100 RI      019       S         NA    lt NA gt    NA   NA   NA         NA    lt NA gt   3    BADEAU   100 RI      032       S         NA    lt NA gt    NA   NA   NA         NA    lt NA gt      Note  It looks like this is arguably a bug in merge  The problem is there is no check that adding the suffixes  to handle overlapping non-matching names  actually makes them unique  At a certain point it uses   data frame which does make unique the names  causing the rbind to fail     first merge will end up with  name x   amp   name y  merge my list  1    my list  2    by match by  all T     1  matchname    party        st           district     chamber      senate1993   name x          8  votes year x senate1994   name y       votes year y   lt 0 rows gt   or 0-length row names    as there is no clash  we retain  name x   amp   name y  and get  name  again merge merge my list  1    my list  2    by match by  all T   my list  3    by match by  all T     1  matchname    party        st           district     chamber      senate1993   name x          8  votes year x senate1994   name y       votes year y senate1995   name         votes year     lt 0 rows gt   or 0-length row names    the next merge will fail as  name  will get renamed to a pre-existing field    Easiest way to fix is to not leave the field renaming for duplicates fields  of which there are many here  up to merge  Eg   my list2   Map function x  i  setNames x  ifelse names x   in  match by        names x   sprintf   s  d   names x   i     my list  seq along my list     The merge Reduce will then work fine

User · Answer

You can do it using merge all in the reshape package  You can pass parameters to merge using the     argument  reshape  merge all list of dataframes         Here is an excellent resource on different methods to merge data frames

User · Answer

The function eat of my package safejoin has such feature  if you give it a list of data frames as a second input it will join them recursively to the first input   Borrowing and extending the accepted answer s data    x  lt - data frame i   c  a   b   c    j   1 3  y  lt - data frame i   c  b   c   d    k   4 6  z  lt - data frame i   c  c   d   a    l   7 9  z2  lt - data frame i   c  a   b   c    l   rep 100L 3  l2   rep 100L 3     for later    devtools  install github  moodymudskipper safejoin   library safejoin  eat x  list y z    by    i       A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt int gt   lt int gt    1 a         1    NA     9   2 b         2     4    NA   3 c         3     5     7     We don t have to take all columns  we can use select helpers from tidyselect and choose  as we start from  x all  x columns are kept    eat x  list y z   starts with  l     by    i       A tibble  3 x 3     i         j     l      lt chr gt   lt int gt   lt int gt    1 a         1     9   2 b         2    NA   3 c         3     7   or remove specific ones   eat x  list y z   -starts with  l     by    i       A tibble  3 x 3     i         j     k      lt chr gt   lt int gt   lt int gt    1 a         1    NA   2 b         2     4   3 c         3     5   If the list is named the names will be used as prefixes    eat x  dplyr  lst y z    by    i       A tibble  3 x 4     i         j   y k   z l      lt chr gt   lt int gt   lt int gt   lt int gt    1 a         1    NA     9   2 b         2     4    NA   3 c         3     5     7   If there are column conflicts the  conflict argument allows you to resolve it  for example by taking the first second one  adding them  coalescing them   or nesting them   keep first    eat x  list y  z  z2    by    i    conflict     x      A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt int gt   lt int gt    1 a         1    NA     9   2 b         2     4    NA   3 c         3     5     7   keep last   eat x  list y  z  z2    by    i    conflict     y      A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt int gt   lt dbl gt    1 a         1    NA   100   2 b         2     4   100   3 c         3     5   100   add   eat x  list y  z  z2    by    i    conflict            A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt int gt   lt dbl gt    1 a         1    NA   109   2 b         2     4    NA   3 c         3     5   107   coalesce   eat x  list y  z  z2    by    i    conflict   dplyr  coalesce      A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt int gt   lt dbl gt    1 a         1    NA     9   2 b         2     4   100   3 c         3     5     7   nest   eat x  list y  z  z2    by    i    conflict    tibble first  x  second  y       A tibble  3 x 4     i         j     k l first  second      lt chr gt   lt int gt   lt int gt     lt int gt     lt int gt    1 a         1    NA       9     100   2 b         2     4      NA     100   3 c         3     5       7     100   NA values can be replaced by using the  fill argument   eat x  list y  z    by    i    fill   0      A tibble  3 x 4     i         j     k     l      lt chr gt   lt int gt   lt dbl gt   lt dbl gt    1 a         1     0     9   2 b         2     4     0   3 c         3     5     7   By default it s an enhanced left join but all dplyr joins are supported through the  mode argument  fuzzy joins are also supported through the match fun argument  it s wrapped around the package fuzzyjoin  or  giving a formula such as    X  var1    gt  Y  var2    amp  X  var3    lt  Y  var4   to the by argument

User · Answer

When you have a list of dfs  and a column contains the  ID   but in some lists  some IDs are missing  then you may use this version of Reduce   Merge in order to join multiple Dfs of missing Row Ids or labels    Reduce function x  y  merge x x  y y  by  V1   all x T  all y T   list of dfs

User · Answer

I will reuse the data example from  PaulRougieux  x  lt - data frame i   c  a   b   c    j   1 3  y  lt - data frame i   c  b   c   d    k   4 6  z  lt - data frame i   c  c   d   a    l   7 9    Here s a short and sweet solution using purrr and tidyr  library tidyverse    list x  y  z    gt      map df gather  key key  value value  -i    gt      spread key  value

User · Answer

You can use recursion to do this   I haven t verified the following  but it should give you the right idea   MergeListOfDf   function  data               if   length  data      2                  return  merge  data   1      data   2                            return  merge  MergeListOfDf  data  -1             data   1

User · Answer

Another question asked specifically how to perform multiple left joins using dplyr in R   The question was marked as a duplicate of this one so I answer here  using the 3 sample data frames below  x  lt - data frame i   c  quot a quot   quot b quot   quot c quot    j   1 3  stringsAsFactors FALSE  y  lt - data frame i   c  quot b quot   quot c quot   quot d quot    k   4 6  stringsAsFactors FALSE  z  lt - data frame i   c  quot c quot   quot d quot   quot a quot    l   7 9  stringsAsFactors FALSE   Update June 2018  I divided the answer in three sections representing three different ways to perform the merge  You probably want to use the purrr way if you are already using the tidyverse packages  For comparison purposes below  you ll find a base R version using the same sample dataset   1  Join them with reduce from the purrr package  The purrr package provides a reduce function which has a concise syntax  library tidyverse  list x  y  z    gt   reduce left join  by    quot i quot      A tibble  3 x 4    i       j     k     l     lt chr gt   lt int gt   lt int gt   lt int gt    1 a      1    NA     9   2 b      2     4    NA   3 c      3     5     7  You can also perform other joins  such as a full join or inner join  list x  y  z    gt   reduce full join  by    quot i quot     A tibble  4 x 4   i       j     k     l    lt chr gt   lt int gt   lt int gt   lt int gt    1 a     1     NA     9   2 b     2     4      NA   3 c     3     5      7   4 d     NA    6      8  list x  y  z    gt   reduce inner join  by    quot i quot     A tibble  1 x 4   i       j     k     l    lt chr gt   lt int gt   lt int gt   lt int gt    1 c     3     5     7   2  dplyr  left join   with base R Reduce    list x y z    gt       Reduce function dtf1 dtf2  left join dtf1 dtf2 by  quot i quot            i j  k  l   1 a 1 NA  9   2 b 2  4 NA   3 c 3  5  7   3  Base R merge   with base R Reduce    And for comparison purposes  here is a base R version of the left join based on Charles s answer   Reduce function dtf1  dtf2  merge dtf1  dtf2  by    quot i quot   all x   TRUE           list x y z       i j  k  l   1 a 1 NA  9   2 b 2  4 NA   3 c 3  5  7

[r] Simultaneously merge multiple data.frames in a list

Examples related to r

Examples related to list

Examples related to merge

Examples related to dataframe

Examples related to r-faq