data frame rows to a list

Question

I have a data frame which I would like to convert to a list by rows  meaning each row would correspond to its own list elements  In other words  I would like a list that is as long as the data frame has rows   So far  I ve tackled this problem in the following manner  but I was wondering if there s a better way to approach this   xy df  lt - data frame x   runif 10    y   runif 10      pre-allocate a list and fill it with a loop xy list  lt - vector  list   nrow xy df   for  i in 1 nrow xy df         xy list  i    lt - xy df i

User · Answer

A couple of more options     With asplit  asplit xy df  1     1         x      y   0 1137 0 6936      2         x      y   0 6223 0 5450      3         x      y   0 6093 0 2827            With split and row  split xy df  row xy df    1       1          x      y  1 0 1137 0 6936     2          x     y  2 0 6223 0 545     3          x      y  3 0 6093 0 2827         data  set seed 1234  xy df  lt - data frame x   runif 10    y   runif 10

User · Answer

Like this   xy list  lt - split xy df  seq nrow xy df      And if you want the rownames of xy df to be the names of the output list  you can do   xy list  lt - setNames split xy df  seq nrow xy df     rownames xy df

User · Answer

A more modern solution uses only purrr  transpose     library purrr  iris 1 2     gt   purrr  transpose     gt    1     gt    1   Sepal Length   gt   1  5 1   gt     gt    1   Sepal Width   gt   1  3 5   gt     gt    1   Petal Length   gt   1  1 4   gt     gt    1   Petal Width   gt   1  0 2   gt     gt    1   Species   gt   1  1   gt     gt     gt    2     gt    2   Sepal Length   gt   1  4 9   gt     gt    2   Sepal Width   gt   1  3   gt     gt    2   Petal Length   gt   1  1 4   gt     gt    2   Petal Width   gt   1  0 2   gt     gt    2   Species   gt   1  1

User · Answer

An alternative way is to convert the df to a matrix then applying the list apply lappy function over it  ldf  lt - lapply as matrix myDF   function x x

User · Answer

If you want to completely abuse the data frame  as I do  and like to keep the   functionality  one way is to split you data frame into one-line data frames gathered in a list     gt  df   data frame x c  a   b   c    y 3 1   gt  df   x y 1 a 3 2 b 2 3 c 1     convert  into a list of data frames ldf   lapply as list 1 dim df  1    function x  df x 1       gt  ldf   1   x y 1 a 3       2   x y 2 b 2   3   x y 3 c 1    and the  coolest   gt  ldf  2   y  1  2   It is not only intellectual masturbation  but allows to  transform  the data frame into a list of its lines  keeping the   indexation which can be useful for further use with lapply  assuming the function you pass to lapply uses this   indexation

User · Answer

The by row function from the purrrlyr package will do this for you     This example demonstrates  myfn  lt - function row       row is a tibble with one row  and the same number of columns as the original df   l  lt - as list row    return l     list of lists  lt - purrrlyr  by row df  myfn   labels FALSE   out   By default  the returned value from myfn is put into a new list column in the df called  out   The   out at the end of the above statement immediately selects this column  returning a list of lists

User · Answer

The best way for me was   Example data   Var1 lt -c  X1  X2   X3   Var2 lt -c  X1  X2   X3   Var3 lt -c  X1  X2   X3    Data lt -cbind Var1 Var2 Var3   ID    Var1   Var2  Var3  1      X1     X2    X3 2      X4     X5    X6 3      X7     X8    X9   We call the  BBmisc library  library BBmisc   data lists lt -convertRowsToList data  2 4     And the result will be   ID    Var1   Var2  Var3  lists 1      X1     X2    X3   list  X1    X2   X3    2      X4     X5    X6   list  X4   X5    X6    3      X7     X8    X9   list  X7  X8  X9

User · Answer

Eureka   xy list  lt - as list as data frame t xy df

User · Answer

Another alternative using library purrr   that seems to be a bit quicker on large data frames   flatten by row xy df    f   function x  flatten chr x    labels   FALSE

User · Answer

Like  flodel wrote  This converts your dataframe into a list that has the same number of elements as number of rows in dataframe   NewList  lt - split df  f   seq nrow df      You can additionaly add a function to select only those columns that are not NA in each element of the list   NewList2  lt - lapply NewList  function x  x   is na x

User · Answer

I was working on this today for a data frame  really a data table  with  millions of observations and 35 columns  My goal was to return a list of data frames  data tables  each with a single row  That is  I wanted to split each row into a separate data frame and store these in a list   Here are two methods I came up with that were roughly 3 times faster than split dat  seq len nrow dat    for that data set  Below  I benchmark the three methods on a 7500 row  5 column data set  iris repeated 50 times    library data table  library microbenchmark   microbenchmark  split  dat1  lt - split dat  seq len nrow dat      setDF  dat2  lt - lapply seq len nrow dat                      function i  setDF lapply dat       i      attrDT  dat3  lt - lapply seq len nrow dat               function i                 tmp  lt - lapply dat       i               attr tmp   class    lt - c  data table    data frame                setDF tmp                  datList    datL  lt - lapply seq len nrow dat                              function i  lapply dat       i     times 20      This returns  Unit  milliseconds        expr      min       lq     mean   median        uq       max neval       split 861 8126 889 1849 973 5294 943 2288 1041 7206 1250 6150    20       setDF 459 0577 466 3432 511 2656 482 1943  500 6958  750 6635    20      attrDT 399 1999 409 6316 461 6454 422 5436  490 5620  717 6355    20     datList 192 1175 201 9896 241 4726 208 4535  246 4299  411 2097    20   While the differences are not as large as in my previous test  the straight setDF method is significantly faster at all levels of the distribution  of runs with max setDF   lt  min split  and the attr method is typically more than twice as fast   A fourth method is the extreme champion  which is a simple nested lapply  returning a nested list  This method exemplifies the cost of constructing a data frame from a list  Moreover  all methods I tried with the data frame function were roughly an order of magnitude slower than the data table techniques   data  dat  lt - vector  list   50  for i in 1 50  dat  i    lt - iris dat  lt - setDF rbindlist dat

User · Answer

Seems a current version of the purrr  0 2 2  package is the fastest solution   by row x  function v  list v   1L     collate    list    out   Let s compare the most interesting solutions   data  Batting   package    Lahman   x  lt - Batting 1 10000  1 10  library benchr  library purrr  benchmark      split   split x  seq len  row names info x  2L         mapply    mapply function      structure list       class    data frame   row names   1L   x  NULL       purrr   by row x  function v  list v   1L     collate    list    out     Rsults   Benchmark summary  Time units   milliseconds    expr n eval   min  lw qu median   mean  up qu  max  total relative  split    100 983 0 1060 0 1130 0 1130 0 1180 0 1450 113000     34 3 mapply    100 826 0  894 0  963 0  972 0 1030 0 1320  97200     29 3  purrr    100  24 1   28 6   32 9   44 9   40 5  183   4490      1 0   Also we can get the same result with Rcpp    include  lt Rcpp h gt  using namespace Rcpp        Rcpp  export   List df2list const DataFrame amp  x        std  size t nrows   x rows        std  size t ncols   x cols        CharacterVector nms   x names        List res no init nrows        for  std  size t i   0  i  lt  nrows    i            List tmp no init ncols            for  std  size t j   0  j  lt  ncols    j                switch TYPEOF x j                      case INTSXP                        if  Rf isFactor x j                              IntegerVector t   as lt IntegerVector gt  x j                            RObject t2   wrap t i                            t2 attr  class      factor                           t2 attr  levels     t attr  levels                            tmp j    t2                        else                           tmp j    as lt IntegerVector gt  x j   i                                             break                                    case LGLSXP                        tmp j    as lt LogicalVector gt  x j   i                       break                                    case CPLXSXP                        tmp j    as lt ComplexVector gt  x j   i                       break                                    case REALSXP                        tmp j    as lt NumericVector gt  x j   i                       break                                    case STRSXP                        tmp j    as lt std  string gt  as lt CharacterVector gt  x j   i                        break                                    default  stop  Unsupported type   s     type2name x                                    tmp attr  class      data frame           tmp attr  row names     1          tmp attr  names     nms          res i    tmp            res attr  names     x attr  row names        return res      Now caompare with purrr   benchmark      purrr   by row x  function v  list v   1L     collate    list    out      rcpp   df2list x      Results   Benchmark summary  Time units   milliseconds   expr n eval  min lw qu median mean up qu   max total relative purrr    100 25 2  29 8   37 5 43 4  44 2 159 0  4340      1 1  rcpp    100 19 0  27 9   34 3 35 8  37 2  93 8  3580      1 0

[list] data.frame rows to a list

Examples related to list

Examples related to r

Examples related to dataframe