[r] combining two data frames of different lengths

I have two data frames.
The first is of only one column and 10 rows.
The second is of 3 columns and 50 rows.

When I try to combine this by using cbind, it gives this error:

Error in data.frame(..., check.names = FALSE) :

Can anyone suggest another function to do this?
P.S I have tried this using lists too, but it gives the same error.

The data frame consisting of 3 columns should be the first 3 columns in a CSV file, whereas the data frame with one column should be the fourth column in that file, when I write with the write.table function. The first 3 columns have 50 rows and the fourth column should occupy the first 10 rows.

This question is related to r dataframe

The answer is


Refering to Andrie's answer, suggesting to use plyr::rbind.fill(): Combined with t() you have something like cbind.fill() (which is not part of plyr) that will construct your data frame with consideration of identical case numbers.


i had similar problem, i matched the entries in a particular column of two data sets and cbind only if it matched. For two data sets, data1 & data2, i am adding a column in data1 from data2 after comparing first column of both.

for(i in 1:nrow(data1){
  for( j in 1:nrow(data2){
    if (data1[i,1]==data2[j,1]) data1[i,3]<- data2[j,2]
  }
}

Hope this will work for you!

You can use library(qpcR) for combining two matrix with unequal size.

resultant_matrix <- qpcR:::cbind.na(matrix1, matrix2)

NOTE:- The resultant matrix will be of size of matrix2.


My idea is to get max of rows count of all data.frames and next append empty matrix to every data.frame if need. This method doesn't require additional packages, only base is used. Code looks following:

list.df <- list(data.frame(a = 1:10), data.frame(a = 1:5), data.frame(a = 1:3))

max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))

list.df <- lapply(list.df, function(x) {
    na.count <- max.rows - nrow(x)
    if (na.count > 0L) {
        na.dm <- matrix(NA, na.count, ncol(x))
        colnames(na.dm) <- colnames(x)
        rbind(x, na.dm)
    } else {
        x
    }
})

do.call(cbind, list.df)

#     a  a  a
# 1   1  1  1
# 2   2  2  2
# 3   3  3  3
# 4   4  4 NA
# 5   5  5 NA
# 6   6 NA NA
# 7   7 NA NA
# 8   8 NA NA
# 9   9 NA NA
# 10 10 NA NA

I think I have come up with a quite shorter solution.. Hope it helps someone.

cbind.na<-function(df1, df2){

  #Collect all unique rownames
  total.rownames<-union(x = rownames(x = df1),y = rownames(x=df2))

  #Create a new dataframe with rownames
  df<-data.frame(row.names = total.rownames)

  #Get absent rownames for both of the dataframe
  absent.names.1<-setdiff(x = rownames(df1),y = rownames(df))
  absent.names.2<-setdiff(x = rownames(df2),y = rownames(df))

  #Fill absents with NAs
  df1.fixed<-data.frame(row.names = absent.names.1,matrix(data = NA,nrow = length(absent.names.1),ncol=ncol(df1)))
  colnames(df1.fixed)<-colnames(df1)
  df1<-rbind(df1,df1.fixed)

  df2.fixed<-data.frame(row.names = absent.names.2,matrix(data = NA,nrow = length(absent.names.2),ncol=ncol(df2)))
  colnames(df2.fixed)<-colnames(df2)
  df2<-rbind(df2,df2.fixed)

  #Finally cbind into new dataframe
  df<-cbind(df,df1[rownames(df),],df2[rownames(df),])
  return(df)

}

I don't actually get an error with this.

a <- as.data.frame(matrix(c(sample(letters,50, replace=T),runif(100)), nrow=50))
b <- sample(letters,10, replace=T)
c <- cbind(a,b)

I used letters incase joining all numerics had different functionality (which it didn't). Your 'first data frame', which is actually just a vector', is just repeated 5 times in that 4th column...

But all the comments from the gurus to the question are still relevant :)


It's not clear to me at all what the OP is actually after, given the follow-up comments. It's possible they are actually looking for a way to write the data to file.

But let's assume that we're really after a way to cbind multiple data frames of differing lengths.

cbind will eventually call data.frame, whose help files says:

Objects passed to data.frame should have the same number of rows, but atomic vectors, factors and character vectors protected by I will be recycled a whole number of times if necessary (including as from R 2.9.0, elements of list arguments).

so in the OP's actual example, there shouldn't be an error, as R ought to recycle the shorter vectors to be of length 50. Indeed, when I run the following:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
cbind(dat1,dat2)

I get no errors and the shorter data frame is recycled as expected. However, when I run this:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(9), e = runif(9))
cbind(dat1,dat2)

I get the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 50, 9

But the wonderful thing about R is that you can make it do almost anything you want, even if you shouldn't. For example, here's a simple function that will cbind data frames of uneven length and automatically pad the shorter ones with NAs:

cbindPad <- function(...){
args <- list(...)
n <- sapply(args,nrow)
mx <- max(n)
pad <- function(x, mx){
    if (nrow(x) < mx){
        nms <- colnames(x)
        padTemp <- matrix(NA, mx - nrow(x), ncol(x))
        colnames(padTemp) <- nms
        if (ncol(x)==0) {
          return(padTemp)
        } else {
        return(rbind(x,padTemp))
          }
    }
    else{
        return(x)
    }
}
rs <- lapply(args,pad,mx)
return(do.call(cbind,rs))
}

which can be used like this:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
dat3 <- data.frame(d = runif(9), e = runif(9))
cbindPad(dat1,dat2,dat3)

I make no guarantees that this function works in all cases; it is meant as an example only.

EDIT

If the primary goal is to create a csv or text file, all you need to do it alter the function to pad using "" rather than NA and then do something like this:

dat <- cbindPad(dat1,dat2,dat3)
rs <- as.data.frame(apply(dat,1,function(x){paste(as.character(x),collapse=",")}))

and then use write.table on rs.


Just my 2 cents. This code combines two matrices or data.frames into one. If one data structure have lower number of rows then missing rows will be added with NA values.

combine.df <- function(x, y) {
    rows.x <- nrow(x)
    rows.y <- nrow(y)
    if (rows.x > rows.y) {
        diff <- rows.x - rows.y
        df.na <- matrix(NA, diff, ncol(y))
        colnames(df.na) <- colnames(y)
        cbind(x, rbind(y, df.na))
    } else {
        diff <- rows.y - rows.x
        df.na <- matrix(NA, diff, ncol(x))
        colnames(df.na) <- colnames(x)
        cbind(rbind(x, df.na), y)
    }
}

df1 <- data.frame(1:10, row.names = 1:10)
df2 <- data.frame(1:5, row.names = 10:14)
combine.df(df1, df2)

In the plyr package there is a function rbind.fill that will merge data.frames and introduce NA for empty cells:

library(plyr)
combined <- rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])
combined[25:40, ]

    mpg    wt cyl
25 19.2 3.845  NA
26 27.3 1.935  NA
27 26.0 2.140  NA
28 30.4 1.513  NA
29 15.8 3.170  NA
30 19.7 2.770  NA
31 15.0 3.570  NA
32 21.4 2.780  NA
33   NA 2.620   6
34   NA 2.875   6
35   NA 2.320   4