Combine two data frames by rows rbind when they have different sets of columns

Question

Is it possible to row bind two data frames that don t have the same set of columns   I am hoping to retain the columns that do not match after the bind

User · Answer

gtools smartbind didnt like working with Dates  probably because it was as vectoring   So here s my solution     sbind   function x  y  fill NA        sbind fill   function d  cols            for c in cols              d  c     fill         d            x   sbind fill x  setdiff names y  names x        y   sbind fill y  setdiff names x  names y         rbind x  y

User · Answer

You could also use sjmisc  add rows    which uses dplyr  bind rows    but unlike bind rows    add rows   preserves attributes and hence is useful for labelled data   See following example with a labelled dataset  The frq  -function prints frequency tables with value labels  if the data is labelled   library sjmisc  library dplyr   data efc    select two subsets  with some identical and else different columns x1  lt - efc   gt   select 1 5    gt   slice 1 10  x2  lt - efc   gt   select 3 7    gt   slice 11 20   str x1    gt   data frame      10 obs  of  5 variables    gt     c12hour   num  16 148 70 168 168 16 161 110 28 40   gt      - attr     label    chr  average number of hours of care per week    gt     e15relat  num  2 2 1 1 2 2 1 4 2 2   gt      - attr     label    chr  relationship to elder    gt      - attr     labels    Named num  1 2 3 4 5 6 7 8   gt         - attr     names    chr   spouse partner   child   sibling   daughter or son -in-law        gt     e16sex    num  2 2 2 2 2 2 1 2 2 2   gt      - attr     label    chr  elder s gender    gt      - attr     labels    Named num  1 2   gt         - attr     names    chr   male   female    gt     e17age    num  83 88 82 67 84 85 74 87 79 83   gt      - attr     label    chr  elder  age    gt     e42dep    num  3 3 3 4 4 4 4 4 4 4   gt      - attr     label    chr  elder s dependency    gt      - attr     labels    Named num  1 2 3 4   gt         - attr     names    chr   independent   slightly dependent   moderately dependent   severely dependent   bind rows x1  x1    gt   frq e42dep    gt     gt    e42dep  lt numeric gt     gt    total N 20  valid N 20  mean 3 70  sd 0 47   gt      gt    val frq raw prc valid prc cum prc   gt      3   6      30        30      30   gt      4  14      70        70     100   gt    lt NA gt    0       0        NA      NA  add rows x1  x1    gt   frq e42dep    gt     gt    elder s dependency  e42dep   lt numeric gt     gt    total N 20  valid N 20  mean 3 70  sd 0 47   gt      gt   val                label frq raw prc valid prc cum prc   gt     1          independent   0       0         0       0   gt     2   slightly dependent   0       0         0       0   gt     3 moderately dependent   6      30        30      30   gt     4   severely dependent  14      70        70     100   gt    NA                   NA   0       0        NA      NA

User · Answer

rbind ordered function x y      diffCol   setdiff colnames x  colnames y     if  length diffCol  gt 0       cols colnames y      for  i in 1 length diffCol   y cbind y NA      colnames y  c cols diffCol         diffCol   setdiff colnames y  colnames x     if  length diffCol  gt 0       cols colnames x      for  i in 1 length diffCol   x cbind x NA      colnames x  c cols diffCol        return rbind x  y   colnames x

User · Answer

rbind fill from the package plyr might be what you are looking for

User · Answer

Most of the base R answers address the situation where only one data frame has additional columns or that the resulting data frame would have the intersection of the columns  Since the OP writes I am hoping to retain the columns that do not match after the bind  an answer using base R methods to address this issue is probably worth posting   Below  I present two base R methods  One that alters the original data frames  and one that doesn t  Additionally  I offer a method that generalizes the non-destructive method to more than two data frames   First  let s get some sample data     sample data  variable c is in df1  variable d is in df2 df1   data frame a 1 5  b 6 10  d month name 1 5   df2   data frame a 6 10  b 16 20  c   letters 8 12       Two data frames  alter originals In order to retain all columns from both data frames in an rbind  and allow the function to work without resulting in an error   you add NA columns to each data frame with the appropriate missing names filled in using setdiff     fill in non-overlapping columns with NAs df1 setdiff names df2   names df1     lt - NA df2 setdiff names df1   names df2     lt - NA   Now  rbind-em  rbind df1  df2      a  b        d    c 1   1  6  January  lt NA gt  2   2  7 February  lt NA gt  3   3  8    March  lt NA gt  4   4  9    April  lt NA gt  5   5 10      May  lt NA gt  6   6 16      lt NA gt     h 7   7 17      lt NA gt     i 8   8 18      lt NA gt     j 9   9 19      lt NA gt     k 10 10 20      lt NA gt     l   Note that the first two lines alter the original data frames  df1 and df2  adding the full set of columns to both     Two data frames  do not alter originals To leave the original data frames intact  first loop through the names that differ  return a named vector of NAs that are concatenated into a list with the data frame using c  Then  data frame converts the result into an appropriate data frame for the rbind   rbind    data frame c df1  sapply setdiff names df2   names df1    function x  NA       data frame c df2  sapply setdiff names df1   names df2    function x  NA          Many data frames  do not alter originals In the instance that you have more than two data frames  you could do the following     put data frames into list  dfs named df1  df2  df3  etc  mydflist  lt - mget ls pattern  df  d       get all variable names allNms  lt - unique unlist lapply mydflist  names       put em all together do call rbind          lapply mydflist                 function x  data frame c x  sapply setdiff allNms  names x                                                      function y  NA        Maybe a bit nicer to not see the row names of original data frames  Then do this   do call rbind          c lapply mydflist                   function x  data frame c x  sapply setdiff allNms  names x                                                        function y  NA                make row names FALSE

User · Answer

A more recent solution is to use dplyr s bind rows function which I assume is more efficient than smartbind   df1  lt - data frame a   c 1 5   b   c 6 10   df2  lt - data frame a   c 11 15   b   c 16 20   c   LETTERS 1 5   dplyr  bind rows df1  df2      a  b    c 1   1  6  lt NA gt  2   2  7  lt NA gt  3   3  8  lt NA gt  4   4  9  lt NA gt  5   5 10  lt NA gt  6  11 16    A 7  12 17    B 8  13 18    C 9  14 19    D 10 15 20    E

User · Answer

You could also just pull out the common column names      gt  cols  lt - intersect colnames df1   colnames df2    gt  rbind df1  cols   df2  cols

User · Answer

Maybe I completely misread your question  but the  I am hoping to retain the columns that do not match after the bind  makes me think you are looking for a left join or right join similar to an SQL query  R has the merge function that lets you specify left  right  or inner joins similar to joining tables in SQL   There is already a great question and answer on this topic here  How to join  merge  data frames  inner  outer  left  right

User · Answer

I wrote a function to do this because I like my code to tell me if something is wrong  This function will explicitly tell you which column names don t match and if you have a type mismatch  Then it will do its best to combine the data frames anyway  The limitation is that you can only combine two data frames at a time       combines data frames  like rbind  but by matching column names   columns without matches in the other data frame are still combined   but with NA in the rows corresponding to the data frame without   the variable   A warning is issued if there is a type mismatch between columns of   the same name and an attempt is made to combine the columns combineByName  lt - function A B        a names  lt - names A      b names  lt - names B      all names  lt - union a names b names      print paste  Number of columns   length all names        a type  lt - NULL     for  i in 1 ncol A             a type i   lt - typeof A  i             b type  lt - NULL     for  i in 1 ncol B             b type i   lt - typeof B  i             a b names  lt - names A   names A  in names B       b a names  lt - names B   names B  in names A       if  length a b names  gt 0   length b a names  gt 0           print  Columns in data frame A but not in data frame B            print a b names          print  Columns in data frame B but not in data frame A            print b a names        else if a names  b names  amp  a type  b type           C  lt - rbind A B          return C            C  lt - list       for i in 1 length all names             l a  lt - all names i  in a names         pos a  lt - match all names i  a names          typ a  lt - a type pos a          l b  lt - all names i  in b names         pos b  lt - match all names i  b names          typ b  lt - b type pos b          if l a  amp  l b                if typ a  typ b                    vec  lt - c A  pos a  B  pos b                 else                   warning c  Type mismatch in variable named    all names i    n                    vec  lt - try c A  pos a  B  pos b                            else if  l a                vec  lt - c A  pos a  rep NA nrow B              else               vec  lt - c rep NA nrow A   B  pos b                     C  i    lt - vec           names C   lt - all names     C  lt - as data frame C      return C

User · Answer

Just for the documentation  You can try the Stack library and its function Stack in the following form   Stack df 1  df 2    I have also the impression that it is faster than other methods for large data sets

User · Answer

If the columns in df1 is a subset of those in df2  by column names    df3  lt - rbind df1  df2   names df1

User · Answer

An alternative with data table   library data table  df1   data frame a   c 1 5   b   c 6 10   df2   data frame a   c 11 15   b   c 16 20   c   LETTERS 1 5   rbindlist list df1  df2   fill   TRUE      rbind will also work in data table as long as the objects are converted to data table objects  so   rbind setDT df1   setDT df2   fill TRUE    will also work in this situation  This can be preferable when you have a couple of data tables and don t want to construct a list

User · Answer

You can use smartbind from the gtools package   Example   library gtools  df1  lt - data frame a   c 1 5   b   c 6 10   df2  lt - data frame a   c 11 15   b   c 16 20   c   LETTERS 1 5   smartbind df1  df2    result      a  b    c 1 1  1  6  lt NA gt  1 2  2  7  lt NA gt  1 3  3  8  lt NA gt  1 4  4  9  lt NA gt  1 5  5 10  lt NA gt  2 1 11 16    A 2 2 12 17    B 2 3 13 18    C 2 4 14 19    D 2 5 15 20    E

[r] Combine two data frames by rows (rbind) when they have different sets of columns

Examples related to r

Examples related to dataframe

Examples related to r-faq