[r] Create an empty data.frame

I'm trying to initialize a data.frame without any rows. Basically, I want to specify the data types for each column and name them, but not have any rows created as a result.

The best I've been able to do so far is something like:

df <- data.frame(Date=as.Date("01/01/2000", format="%m/%d/%Y"), 
                 File="", User="", stringsAsFactors=FALSE)
df <- df[-1,]

Which creates a data.frame with a single row containing all of the data types and column names I wanted, but also creates a useless row which then needs to be removed.

Is there a better way to do this?

This question is related to r dataframe r-faq

The answer is


I keep this function handy for whenever I need it, and change the column names and classes to suit the use case:

make_df <- function() { data.frame(name=character(),
                     profile=character(),
                     sector=character(),
                     type=character(),
                     year_range=character(),
                     link=character(),
                     stringsAsFactors = F)
}

make_df()
[1] name       profile    sector     type       year_range link      
<0 rows> (or 0-length row.names)

Say your column names are dynamic, you can create an empty row-named matrix and transform it to a data frame.

nms <- sample(LETTERS,sample(1:10))
as.data.frame(t(matrix(nrow=length(nms),ncol=0,dimnames=list(nms))))

You can do it without specifying column types

df = data.frame(matrix(vector(), 0, 3,
                dimnames=list(c(), c("Date", "File", "User"))),
                stringsAsFactors=F)

If you are looking for shortness :

read.csv(text="col1,col2")

so you don't need to specify the column names separately. You get the default column type logical until you fill the data frame.


If you don't mind not specifying data types explicitly, you can do it this way:

headers<-c("Date","File","User")
df <- as.data.frame(matrix(,ncol=3,nrow=0))
names(df)<-headers

#then bind incoming data frame with col types to set data types
df<-rbind(df, new_df)

I created empty data frame using following code

df = data.frame(id = numeric(0), jobs = numeric(0));

and tried to bind some rows to populate the same as follows.

newrow = c(3, 4)
df <- rbind(df, newrow)

but it started giving incorrect column names as follows

  X3 X4
1  3  4

Solution to this is to convert newrow to type df as follows

newrow = data.frame(id=3, jobs=4)
df <- rbind(df, newrow)

now gives correct data frame when displayed with column names as follows

  id nobs
1  3   4 

If you want to declare such a data.frame with many columns, it'll probably be a pain to type all the column classes out by hand. Especially if you can make use of rep, this approach is easy and fast (about 15% faster than the other solution that can be generalized like this):

If your desired column classes are in a vector colClasses, you can do the following:

library(data.table)
setnames(setDF(lapply(colClasses, function(x) eval(call(x)))), col.names)

lapply will result in a list of desired length, each element of which is simply an empty typed vector like numeric() or integer().

setDF converts this list by reference to a data.frame.

setnames adds the desired names by reference.

Speed comparison:

classes <- c("character", "numeric", "factor",
             "integer", "logical","raw", "complex")

NN <- 300
colClasses <- sample(classes, NN, replace = TRUE)
col.names <- paste0("V", 1:NN)

setDF(lapply(colClasses, function(x) eval(call(x))))

library(microbenchmark)
microbenchmark(times = 1000,
               read = read.table(text = "", colClasses = colClasses,
                                 col.names = col.names),
               DT = setnames(setDF(lapply(colClasses, function(x)
                 eval(call(x)))), col.names))
# Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#  read 2.598226 2.707445 3.247340 2.747835 2.800134 22.46545  1000   b
#    DT 2.257448 2.357754 2.895453 2.401408 2.453778 17.20883  1000  a 

It's also faster than using structure in a similar way:

microbenchmark(times = 1000,
               DT = setnames(setDF(lapply(colClasses, function(x)
                 eval(call(x)))), col.names),
               struct = eval(parse(text=paste0(
                 "structure(list(", 
                 paste(paste0(col.names, "=", 
                              colClasses, "()"), collapse = ","),
                 "), class = \"data.frame\")"))))
#Unit: milliseconds
#   expr      min       lq     mean   median       uq       max neval cld
#     DT 2.068121 2.167180 2.821868 2.211214 2.268569 143.70901  1000  a 
# struct 2.613944 2.723053 3.177748 2.767746 2.831422  21.44862  1000   b

If you already have an existent data frame, let's say df that has the columns you want, then you can just create an empty data frame by removing all the rows:

empty_df = df[FALSE,]

Notice that df still contains the data, but empty_df doesn't.

I found this question looking for how to create a new instance with empty rows, so I think it might be helpful for some people.


Just declare

table = data.frame()

when you try to rbind the first line it will create the columns


This question didn't specifically address my concerns (outlined here) but in case anyone wants to do this with a parameterized number of columns and no coercion:

> require(dplyr)
> dbNames <- c('a','b','c','d')
> emptyTableOut <- 
    data.frame(
        character(), 
        matrix(integer(), ncol = 3, nrow = 0), stringsAsFactors = FALSE
    ) %>% 
    setNames(nm = c(dbNames))
> glimpse(emptyTableOut)
Observations: 0
Variables: 4
$ a <chr> 
$ b <int> 
$ c <int> 
$ d <int>

As divibisan states on the linked question,

...the reason [coercion] occurs [when cbinding matrices and their constituent types] is that a matrix can only have a single data type. When you cbind 2 matrices, the result is still a matrix and so the variables are all coerced into a single type before converting to a data.frame


To create an empty data frame, pass in the number of rows and columns needed into the following function:

create_empty_table <- function(num_rows, num_cols) {
    frame <- data.frame(matrix(NA, nrow = num_rows, ncol = num_cols))
    return(frame)
}

To create an empty frame while specifying the class of each column, simply pass a vector of the desired data types into the following function:

create_empty_table <- function(num_rows, num_cols, type_vec) {
  frame <- data.frame(matrix(NA, nrow = num_rows, ncol = num_cols))
  for(i in 1:ncol(frame)) {
    print(type_vec[i])
    if(type_vec[i] == 'numeric') {frame[,i] <- as.numeric(frame[,i])}
    if(type_vec[i] == 'character') {frame[,i] <- as.character(frame[,i])}
    if(type_vec[i] == 'logical') {frame[,i] <- as.logical(frame[,i])}
    if(type_vec[i] == 'factor') {frame[,i] <- as.factor(frame[,i])}
  }
  return(frame)
}

Use as follows:

df <- create_empty_table(3, 3, c('character','logical','numeric'))

Which gives:

   X1  X2 X3
1 <NA> NA NA
2 <NA> NA NA
3 <NA> NA NA

To confirm your choices, run the following:

lapply(df, class)

#output
$X1
[1] "character"

$X2
[1] "logical"

$X3
[1] "numeric"

If you already have a dataframe, you can extract the metadata (column names and types) from a dataframe (e.g. if you are controlling a BUG which is only triggered with certain inputs and need a empty dummy Dataframe):

colums_and_types <- sapply(df, class)

# prints: "c('col1', 'col2')"
print(dput(as.character(names(colums_and_types))))

# prints: "c('integer', 'factor')"
dput(as.character(as.vector(colums_and_types)))

And then use the read.table to create the empty dataframe

read.table(text = "",
   colClasses = c('integer', 'factor'),
   col.names = c('col1', 'col2'))

The most efficient way to do this is to use structure to create a list that has the class "data.frame":

structure(list(Date = as.Date(character()), File = character(), User = character()), 
          class = "data.frame")
# [1] Date File User
# <0 rows> (or 0-length row.names)

To put this into perspective compared to the presently accepted answer, here's a simple benchmark:

s <- function() structure(list(Date = as.Date(character()), 
                               File = character(), 
                               User = character()), 
                          class = "data.frame")
d <- function() data.frame(Date = as.Date(character()),
                           File = character(), 
                           User = character(), 
                           stringsAsFactors = FALSE) 
library("microbenchmark")
microbenchmark(s(), d())
# Unit: microseconds
#  expr     min       lq     mean   median      uq      max neval
#   s()  58.503  66.5860  90.7682  82.1735 101.803  469.560   100
#   d() 370.644 382.5755 523.3397 420.1025 604.654 1565.711   100

You could use read.table with an empty string for the input text as follows:

colClasses = c("Date", "character", "character")
col.names = c("Date", "File", "User")

df <- read.table(text = "",
                 colClasses = colClasses,
                 col.names = col.names)

Alternatively specifying the col.names as a string:

df <- read.csv(text="Date,File,User", colClasses = colClasses)

Thanks to Richard Scriven for the improvement


If you want to create an empty data.frame with dynamic names (colnames in a variable), this can help:

names <- c("v","u","w")
df <- data.frame()
for (k in names) df[[k]]<-as.numeric()

You can change the types as well if you need so. like:

names <- c("u", "v")
df <- data.frame()
df[[names[1]]] <- as.numeric()
df[[names[2]]] <- as.character()

By Using data.table we can specify data types for each column.

library(data.table)    
data=data.table(a=numeric(), b=numeric(), c=numeric())

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc