I have two data frames.
The first is of only one column and 10 rows.
The second is of 3 columns and 50 rows.
When I try to combine this by using cbind
, it gives this error:
Error in data.frame(..., check.names = FALSE) :
Can anyone suggest another function to do this?
P.S I have tried this using lists too, but it gives the same error.
The data frame consisting of 3 columns should be the first 3 columns in a CSV file, whereas the data frame with one column should be the fourth column in that file, when I write with the write.table
function. The first 3 columns have 50 rows and the fourth column should occupy the first 10 rows.
Refering to Andrie's answer, suggesting to use plyr::rbind.fill()
:
Combined with t()
you have something like cbind.fill()
(which is not part of plyr
) that will construct your data frame with consideration of identical case numbers.
i had similar problem, i matched the entries in a particular column of two data sets and cbind only if it matched. For two data sets, data1 & data2, i am adding a column in data1 from data2 after comparing first column of both.
for(i in 1:nrow(data1){
for( j in 1:nrow(data2){
if (data1[i,1]==data2[j,1]) data1[i,3]<- data2[j,2]
}
}
Hope this will work for you!
You can use library(qpcR)
for combining two matrix with unequal size.
resultant_matrix <- qpcR:::cbind.na(matrix1, matrix2)
NOTE:- The resultant matrix will be of size of matrix2.
My idea is to get max of rows count of all data.frames and next append empty matrix to every data.frame if need. This method doesn't require additional packages, only base is used. Code looks following:
list.df <- list(data.frame(a = 1:10), data.frame(a = 1:5), data.frame(a = 1:3))
max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))
list.df <- lapply(list.df, function(x) {
na.count <- max.rows - nrow(x)
if (na.count > 0L) {
na.dm <- matrix(NA, na.count, ncol(x))
colnames(na.dm) <- colnames(x)
rbind(x, na.dm)
} else {
x
}
})
do.call(cbind, list.df)
# a a a
# 1 1 1 1
# 2 2 2 2
# 3 3 3 3
# 4 4 4 NA
# 5 5 5 NA
# 6 6 NA NA
# 7 7 NA NA
# 8 8 NA NA
# 9 9 NA NA
# 10 10 NA NA
I think I have come up with a quite shorter solution.. Hope it helps someone.
cbind.na<-function(df1, df2){
#Collect all unique rownames
total.rownames<-union(x = rownames(x = df1),y = rownames(x=df2))
#Create a new dataframe with rownames
df<-data.frame(row.names = total.rownames)
#Get absent rownames for both of the dataframe
absent.names.1<-setdiff(x = rownames(df1),y = rownames(df))
absent.names.2<-setdiff(x = rownames(df2),y = rownames(df))
#Fill absents with NAs
df1.fixed<-data.frame(row.names = absent.names.1,matrix(data = NA,nrow = length(absent.names.1),ncol=ncol(df1)))
colnames(df1.fixed)<-colnames(df1)
df1<-rbind(df1,df1.fixed)
df2.fixed<-data.frame(row.names = absent.names.2,matrix(data = NA,nrow = length(absent.names.2),ncol=ncol(df2)))
colnames(df2.fixed)<-colnames(df2)
df2<-rbind(df2,df2.fixed)
#Finally cbind into new dataframe
df<-cbind(df,df1[rownames(df),],df2[rownames(df),])
return(df)
}
I don't actually get an error with this.
a <- as.data.frame(matrix(c(sample(letters,50, replace=T),runif(100)), nrow=50))
b <- sample(letters,10, replace=T)
c <- cbind(a,b)
I used letters incase joining all numerics had different functionality (which it didn't). Your 'first data frame', which is actually just a vector', is just repeated 5 times in that 4th column...
But all the comments from the gurus to the question are still relevant :)
It's not clear to me at all what the OP is actually after, given the follow-up comments. It's possible they are actually looking for a way to write the data to file.
But let's assume that we're really after a way to cbind
multiple data frames of differing lengths.
cbind
will eventually call data.frame
, whose help files says:
Objects passed to data.frame should have the same number of rows, but atomic vectors, factors and character vectors protected by I will be recycled a whole number of times if necessary (including as from R 2.9.0, elements of list arguments).
so in the OP's actual example, there shouldn't be an error, as R ought to recycle the shorter vectors to be of length 50. Indeed, when I run the following:
set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
cbind(dat1,dat2)
I get no errors and the shorter data frame is recycled as expected. However, when I run this:
set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(9), e = runif(9))
cbind(dat1,dat2)
I get the following error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 50, 9
But the wonderful thing about R is that you can make it do almost anything you want, even if you shouldn't. For example, here's a simple function that will cbind
data frames of uneven length and automatically pad the shorter ones with NA
s:
cbindPad <- function(...){
args <- list(...)
n <- sapply(args,nrow)
mx <- max(n)
pad <- function(x, mx){
if (nrow(x) < mx){
nms <- colnames(x)
padTemp <- matrix(NA, mx - nrow(x), ncol(x))
colnames(padTemp) <- nms
if (ncol(x)==0) {
return(padTemp)
} else {
return(rbind(x,padTemp))
}
}
else{
return(x)
}
}
rs <- lapply(args,pad,mx)
return(do.call(cbind,rs))
}
which can be used like this:
set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
dat3 <- data.frame(d = runif(9), e = runif(9))
cbindPad(dat1,dat2,dat3)
I make no guarantees that this function works in all cases; it is meant as an example only.
EDIT
If the primary goal is to create a csv or text file, all you need to do it alter the function to pad using ""
rather than NA
and then do something like this:
dat <- cbindPad(dat1,dat2,dat3)
rs <- as.data.frame(apply(dat,1,function(x){paste(as.character(x),collapse=",")}))
and then use write.table
on rs
.
Just my 2 cents. This code combines two matrices or data.frames into one. If one data structure have lower number of rows then missing rows will be added with NA values.
combine.df <- function(x, y) {
rows.x <- nrow(x)
rows.y <- nrow(y)
if (rows.x > rows.y) {
diff <- rows.x - rows.y
df.na <- matrix(NA, diff, ncol(y))
colnames(df.na) <- colnames(y)
cbind(x, rbind(y, df.na))
} else {
diff <- rows.y - rows.x
df.na <- matrix(NA, diff, ncol(x))
colnames(df.na) <- colnames(x)
cbind(rbind(x, df.na), y)
}
}
df1 <- data.frame(1:10, row.names = 1:10)
df2 <- data.frame(1:5, row.names = 10:14)
combine.df(df1, df2)
In the plyr
package there is a function rbind.fill
that will merge data.frames and introduce NA
for empty cells:
library(plyr)
combined <- rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])
combined[25:40, ]
mpg wt cyl
25 19.2 3.845 NA
26 27.3 1.935 NA
27 26.0 2.140 NA
28 30.4 1.513 NA
29 15.8 3.170 NA
30 19.7 2.770 NA
31 15.0 3.570 NA
32 21.4 2.780 NA
33 NA 2.620 6
34 NA 2.875 6
35 NA 2.320 4
Source: Stackoverflow.com