Creating an R dataframe row-by-row

Question

I would like to construct a dataframe row-by-row in R  I ve done some searching  and all I came up with is the suggestion to create an empty list  keep a list index scalar  then each time add to the list a single-row dataframe and advance the list index by one  Finally  do call rbind   on the list   While this works  it seems very cumbersome  Isn t there an easier way for achieving the same goal   Obviously I refer to cases where I can t use some apply function and explicitly need to create the dataframe row by row  At least  is there a way to push into the end of a list instead of explicitly keeping track of the last index used

User · Answer

This is a silly example of how to use do call rbind   on the output of Map    which is similar to lapply      gt  DF  lt - do call rbind Map function x  data frame a x b x 1  x 1 3    gt  DF   x y 1 1 2 2 2 3 3 3 4  gt  class DF   1   data frame    I use this construct quite often

User · Answer

One can add rows to NULL   df lt -NULL  while          Some code that generates new row   rbind df row - gt df     for instance  df lt -NULL for e in 1 10  rbind df data frame x e square e 2 even factor e  2  0   - gt df print df

User · Answer

I ve found this way to create dataframe by raw without matrix   With automatic column name  df lt -data frame          t data frame c 1  a  100  c 2  b  200  c 3  c  300             row names   NULL stringsAsFactors   FALSE         With column name  df lt -setNames          data frame              t data frame c 1  a  100  c 2  b  200  c 3  c  300                 row names   NULL stringsAsFactors   FALSE                     c  col1   col2   col3

User · Answer

If you have vectors destined to become rows  concatenate them using c    pass them to a matrix row-by-row  and convert that matrix to a dataframe   For example  rows  dummydata1 c 2002 10 1 12 00 101 426340 0 4411238 0 3598 0 0 92 57 77 4 80 238 29 -9 9  dummydata2 c 2002 10 2 12 00 101 426340 0 4411238 0 3598 0 -3 02 78 77 -9999 00 -99 0 -9 9  dummydata3 c 2002 10 8 12 00 101 426340 0 4411238 0 3598 0 -5 02 88 77 -9999 00 -99 0 -9 9    can be converted to a data frame thus   dummyset c dummydata1 dummydata2 dummydata3  col len length dummydata1  dummytable data frame matrix data dummyset ncol col len byrow TRUE     Admittedly  I see 2 major limitations   1  this only works with single-mode data  and  2  you must know your final   columns for this to work  i e   I m assuming that you re not working with a ragged array whose greatest row length is unknown a priori    This solution seems simple  but from my experience with type conversions in R  I m sure it creates new challenges down-the-line  Can anyone comment on this

User · Answer

Dirk Eddelbuettel s answer is the best  here I just note that you can get away with not pre-specifying the dataframe dimensions or data types  which is sometimes useful if you have multiple data types and lots of columns   row1 lt -list  a  1 FALSE   use  list   not  c  or  cbind   row2 lt -list  b  2 TRUE     df lt -data frame row1 stringsAsFactors   F   first row df lt -rbind df row2   now this works as you d expect

User · Answer

You can grow them row by row by appending or using rbind       That does not mean you should   Dynamically growing structures is one of the least efficient ways to code in R   If you can  allocate your entire data frame up front   N  lt - 1e4    total number of rows to preallocate--possibly an overestimate  DF  lt - data frame num rep NA  N   txt rep     N      as many cols as you need                  stringsAsFactors FALSE             you don t know levels yet   and then during your operations insert row at a time  DF i     lt - list 1 4   foo     That should work for arbitrary data frame and be much more efficient   If you overshot N you can always shrink empty rows out at the end

User · Answer

The reason I like Rcpp so much is that I don t always get how R Core thinks  and with Rcpp  more often than not  I don t have to     Speaking philosophically  you re in a state of sin with regards to the functional paradigm  which tries to ensure that every value appears independent of every other value  changing one value should never cause a visible change in another value  the way you get with pointers sharing representation in C     The problems arise when functional programming signals the small craft to move out of the way  and the small craft replies  I m a lighthouse    Making a long series of small changes to a large object which you want to process on in the meantime puts you square into lighthouse territory      In the C   STL  push back   is a way of life   It doesn t try to be functional  but it does try to accommodate common programming idioms efficiently      With some cleverness behind the scenes  you can sometimes arrange to have one foot in each world   Snapshot based file systems are a good example  which evolved from concepts such as union mounts  which also ply both sides      If R Core wanted to do this  underlying vector storage could function like a union mount   One reference to the vector storage might be valid for subscripts 1 N  while another reference to the same storage is valid for subscripts 1  N 1    There could be reserved storage not yet validly referenced by anything but convenient for a quick push back     You don t violate the functional concept when appending outside the range that any existing reference considers valid     Eventually appending rows incrementally  you run out of reserved storage   You ll need to create new copies of everything  with the storage multiplied by some increment   The STL implementations I ve use tend to multiply storage by 2 when extending allocation   I thought I read in R Internals that there is a memory structure where the storage increments by 20    Either way  growth operations occur with logarithmic frequency relative to the total number of elements appended   On an amortized basis  this is usually acceptable     As tricks behind the scenes go  I ve seen worse   Every time you push back   a new row onto the dataframe  a top level index structure would need to be copied   The new row could append onto shared representation without impacting any old functional values   I don t even think it would complicate the garbage collector much  since I m not proposing push front   all references are prefix references to the front of the allocated vector storage

User · Answer

Depending on the format of your new row  you might use tibble  add row if your new row is simple and can specified in  value-pairs   Or you could use dplyr  bind rows   an efficient implementation of the common pattern of do call rbind  dfs

[list] Creating an R dataframe row-by-row

Examples related to list

Examples related to r

Examples related to dataframe