[list] Creating an R dataframe row-by-row

I would like to construct a dataframe row-by-row in R. I've done some searching, and all I came up with is the suggestion to create an empty list, keep a list index scalar, then each time add to the list a single-row dataframe and advance the list index by one. Finally, do.call(rbind,) on the list.

While this works, it seems very cumbersome. Isn't there an easier way for achieving the same goal?

Obviously I refer to cases where I can't use some apply function and explicitly need to create the dataframe row by row. At least, is there a way to push into the end of a list instead of explicitly keeping track of the last index used?

This question is related to list r dataframe

The answer is


One can add rows to NULL:

df<-NULL;
while(...){
  #Some code that generates new row
  rbind(df,row)->df
}

for instance

df<-NULL
for(e in 1:10) rbind(df,data.frame(x=e,square=e^2,even=factor(e%%2==0)))->df
print(df)

The reason I like Rcpp so much is that I don't always get how R Core thinks, and with Rcpp, more often than not, I don't have to.

Speaking philosophically, you're in a state of sin with regards to the functional paradigm, which tries to ensure that every value appears independent of every other value; changing one value should never cause a visible change in another value, the way you get with pointers sharing representation in C.

The problems arise when functional programming signals the small craft to move out of the way, and the small craft replies "I'm a lighthouse". Making a long series of small changes to a large object which you want to process on in the meantime puts you square into lighthouse territory.

In the C++ STL, push_back() is a way of life. It doesn't try to be functional, but it does try to accommodate common programming idioms efficiently.

With some cleverness behind the scenes, you can sometimes arrange to have one foot in each world. Snapshot based file systems are a good example (which evolved from concepts such as union mounts, which also ply both sides).

If R Core wanted to do this, underlying vector storage could function like a union mount. One reference to the vector storage might be valid for subscripts 1:N, while another reference to the same storage is valid for subscripts 1:(N+1). There could be reserved storage not yet validly referenced by anything but convenient for a quick push_back(). You don't violate the functional concept when appending outside the range that any existing reference considers valid.

Eventually appending rows incrementally, you run out of reserved storage. You'll need to create new copies of everything, with the storage multiplied by some increment. The STL implementations I've use tend to multiply storage by 2 when extending allocation. I thought I read in R Internals that there is a memory structure where the storage increments by 20%. Either way, growth operations occur with logarithmic frequency relative to the total number of elements appended. On an amortized basis, this is usually acceptable.

As tricks behind the scenes go, I've seen worse. Every time you push_back() a new row onto the dataframe, a top level index structure would need to be copied. The new row could append onto shared representation without impacting any old functional values. I don't even think it would complicate the garbage collector much; since I'm not proposing push_front() all references are prefix references to the front of the allocated vector storage.


Dirk Eddelbuettel's answer is the best; here I just note that you can get away with not pre-specifying the dataframe dimensions or data types, which is sometimes useful if you have multiple data types and lots of columns:

row1<-list("a",1,FALSE) #use 'list', not 'c' or 'cbind'!
row2<-list("b",2,TRUE)  

df<-data.frame(row1,stringsAsFactors = F) #first row
df<-rbind(df,row2) #now this works as you'd expect.

This is a silly example of how to use do.call(rbind,) on the output of Map() [which is similar to lapply()]

> DF <- do.call(rbind,Map(function(x) data.frame(a=x,b=x+1),x=1:3))
> DF
  x y
1 1 2
2 2 3
3 3 4
> class(DF)
[1] "data.frame"

I use this construct quite often.


If you have vectors destined to become rows, concatenate them using c(), pass them to a matrix row-by-row, and convert that matrix to a dataframe.

For example, rows

dummydata1=c(2002,10,1,12.00,101,426340.0,4411238.0,3598.0,0.92,57.77,4.80,238.29,-9.9)
dummydata2=c(2002,10,2,12.00,101,426340.0,4411238.0,3598.0,-3.02,78.77,-9999.00,-99.0,-9.9)
dummydata3=c(2002,10,8,12.00,101,426340.0,4411238.0,3598.0,-5.02,88.77,-9999.00,-99.0,-9.9)

can be converted to a data frame thus:

dummyset=c(dummydata1,dummydata2,dummydata3)
col.len=length(dummydata1)
dummytable=data.frame(matrix(data=dummyset,ncol=col.len,byrow=TRUE))

Admittedly, I see 2 major limitations: (1) this only works with single-mode data, and (2) you must know your final # columns for this to work (i.e., I'm assuming that you're not working with a ragged array whose greatest row length is unknown a priori).

This solution seems simple, but from my experience with type conversions in R, I'm sure it creates new challenges down-the-line. Can anyone comment on this?


I've found this way to create dataframe by raw without matrix.

With automatic column name

df<-data.frame(
        t(data.frame(c(1,"a",100),c(2,"b",200),c(3,"c",300)))
        ,row.names = NULL,stringsAsFactors = FALSE
    )

With column name

df<-setNames(
        data.frame(
            t(data.frame(c(1,"a",100),c(2,"b",200),c(3,"c",300)))
            ,row.names = NULL,stringsAsFactors = FALSE
        ), 
        c("col1","col2","col3")
    )

Depending on the format of your new row, you might use tibble::add_row if your new row is simple and can specified in "value-pairs". Or you could use dplyr::bind_rows, "an efficient implementation of the common pattern of do.call(rbind, dfs)".


Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe