[r] R define dimensions of empty data frame

I am trying to collect some data from multiple subsets of a data set and need to create a data frame to collect the results. My problem is don't know how to create an empty data frame with defined number of columns without actually having data to put into it.

collect1 <- c()  ## i'd like to create empty df w/ 3 columns: `id`, `max1` and `min1`

for(i in 1:10){
collect1$id <- i
ss1 <- subset(df1, df1$id == i)
collect1$max1 <- max(ss1$value)
collect1$min1 <- min(ss1$value)
}

I feel very dumb asking this question (I almost feel like I've asked it on SO before but can't find it) but would greatly appreciate any help.

This question is related to r

The answer is


A more general method to create an arbitrary size data frame is to create a n-by-1 data-frame from a matrix of the same dimension. Then, you can immediately drop the first row:

> v <- data.frame(matrix(NA, nrow=1, ncol=10))
> v <- v[-1, , drop=FALSE]
> v
 [1] X1  X2  X3  X4  X5  X6  X7  X8  X9  X10
<0 rows> (or 0-length row.names)

If only the column names are available like :

cnms <- c("Nam1","Nam2","Nam3")

To create an empty data frame with the above variable names, first create a data.frame object:

emptydf <- data.frame()

Now call zeroth element of every column, thus creating an empty data frame with the given variable names:

for( i in 1:length(cnms)){
     emptydf[0,eval(cnms[i])]
 }

I have come across the same problem and have a cleaner solution. Instead of creating an empty data.frame you can instead save your data as a named list. Once you have added all results to this list you convert it to a data.frame after.

For the case of adding features one at a time this works best.

mylist = list()
for(column in 1:10) mylist$column = rnorm(10)
mydf = data.frame(mylist)

For the case of adding rows one at a time this becomes tricky due to mixed types. If all types are the same it is easy.

mylist = list()
for(row in 1:10) mylist$row = rnorm(10)
mydf = data.frame(do.call(rbind, mylist))

I haven't found a simple way to add rows of mixed types. In this case, if you must do it this way, the empty data.frame is probably the best solution.


It might help the solution given in another forum, Basically is: i.e.

Cols <- paste("A", 1:5, sep="")
DF <- read.table(textConnection(""), col.names = Cols,colClasses = "character")

> str(DF)
'data.frame':   0 obs. of  5 variables:
$ A1: chr
$ A2: chr
$ A3: chr
$ A4: chr
$ A5: chr

You can change the colClasses to fit your needs.

Original link is https://stat.ethz.ch/pipermail/r-help/2008-August/169966.html


You may use NULL instead of NA. This creates a truly empty data frame.


df = data.frame(matrix("", ncol = 3, nrow = 10)  

seq_along may help to find out how many rows in your data file and create a data.frame with the desired number of rows

    listdf <- data.frame(ID=seq_along(df),
                              var1=seq_along(df), var2=seq_along(df))

Would a dataframe of NAs work? something like:

data.frame(matrix(NA, nrow = 2, ncol = 3))

if you need to be more specific about the data type then may prefer: NA_integer_, NA_real_, NA_complex_, or NA_character_ instead of just NA which is logical

Something else that may be more specific that the NAs is:

data.frame(matrix(vector(mode = 'numeric',length = 6), nrow = 2, ncol = 3))

where the mode can be of any type. See ?vector


You can do something like:

N <- 10
collect1 <- data.frame(id   = integer(N),
                       max1 = numeric(N),
                       min1 = numeric(N))

Now be careful that in the rest of your code, you forgot to use the row index for filling the data.frame row by row. It should be:

for(i in seq_len(N)){
   collect1$id[i] <- i
   ss1 <- subset(df1, df1$id == i)
   collect1$max1[i] <- max(ss1$value)
   collect1$min1[i] <- min(ss1$value)
}

Finally, I would say that there are many alternatives for doing what you are trying to accomplish, some would be much more efficient and use a lot less typing. You could for example look at the aggregate function, or ddply from the plyr package.


Here a solution if you want an empty data frame with a defined number of rows and NO columns:

df = data.frame(matrix(NA, ncol=1, nrow=10)[-1]