How do I make a list of data frames

Question

How do I make a list of data frames and how do I access each of those data frames from the list   For example  how can I put these data frames in a list    d1  lt - data frame y1   c 1  2  3                    y2   c 4  5  6   d2  lt - data frame y1   c 3  2  1                    y2   c 6  5  4

User · Answer

Taking as a given you have a  large  number of data frames with similar names  here d  where   is some positive integer   the following is a slight improvement of  mark-miller s method  It is more terse and returns a named list of data frames  where each name in the list is the name of the corresponding original data frame   The key is using mget together with ls  If the data frames d1 and d2 provided in the question were the only objects with names d  in the environment  then  my list  lt - mget ls pattern   d 0-9        which would return  my list  d1   y1 y2 1  1  4 2  2  5 3  3  6   d2   y1 y2 1  3  6 2  2  5 3  1  4   This method takes advantage of the pattern argument in ls  which allows us to use regular expressions to do a finer parsing of the names of objects in the environment  An alternative to the regex   d 0-9     is   d  d      As  gregor points out  it is a better overall to set up your data construction process so that the data frames are put into named lists at the start   data  d1  lt - data frame y1   c 1 2 3  y2   c 4 5 6   d2  lt - data frame y1   c 3 2 1  y2   c 6 5 4

User · Answer

You can also access specific columns and values in each list element with    and     Here are a couple of examples    First  we can access only the first column of each data frame in the list with lapply ldf       1   where 1 signifies the column number   ldf  lt - list d1   d1  d2   d2      create a named list of your data frames lapply ldf       1     d1     y1   1  1   2  2   3  3      d2     y1   1  3   2  2   3  1   Similarly  we can access the first value in the second column with  lapply ldf       1  2     d1    1  4       d2    1  6   Then we can also access the column values directly  as a vector  with     lapply ldf        1     d1    1  1 2 3      d2    1  3 2 1

User · Answer

If you have a large number of sequentially named data frames you can create a list of the desired subset of data frames like this   d1  lt - data frame y1 c 1 2 3   y2 c 4 5 6   d2  lt - data frame y1 c 3 2 1   y2 c 6 5 4   d3  lt - data frame y1 c 6 5 4   y2 c 3 2 1   d4  lt - data frame y1 c 9 9 9   y2 c 8 8 8    my list  lt - list d1  d2  d3  d4  my list  my list2  lt - lapply paste  d   seq 2 4 1   sep      get  my list2   where my list2 returns a list containing the 2nd  3rd and 4th data frames     1     y1 y2 1  3  6 2  2  5 3  1  4    2     y1 y2 1  6  3 2  5  2 3  4  1    3     y1 y2 1  9  8 2  9  8 3  9  8   Note  however  that the data frames in the above list are no longer named   If you want to create a list containing a subset of data frames and want to preserve their names you can try this   list function  lt -  function            d1  lt - data frame y1 c 1 2 3   y2 c 4 5 6        d2  lt - data frame y1 c 3 2 1   y2 c 6 5 4        d3  lt - data frame y1 c 6 5 4   y2 c 3 2 1        d4  lt - data frame y1 c 9 9 9   y2 c 8 8 8         sapply paste  d   seq 2 4 1   sep      get  environment    simplify   FALSE       my list3  lt - list function   my list3   which returns    gt  my list3  d2   y1 y2 1  3  6 2  2  5 3  1  4   d3   y1 y2 1  6  3 2  5  2 3  4  1   d4   y1 y2 1  9  8 2  9  8 3  9  8   gt  str my list3  List of 3    d2  data frame       3 obs  of  2 variables        y1  num  1 3  3 2 1       y2  num  1 3  6 5 4    d3  data frame       3 obs  of  2 variables        y1  num  1 3  6 5 4       y2  num  1 3  3 2 1    d4  data frame       3 obs  of  2 variables        y1  num  1 3  9 9 9       y2  num  1 3  8 8 8   gt  my list3  1     y1 y2 1  3  6 2  2  5 3  1  4   gt  my list3 d4   y1 y2 1  9  8 2  9  8 3  9  8

User · Answer

The other answers show you how to make a list of data frames when you already have a bunch of data frames  e g   d1  d2       Having sequentially named data frames is a problem  and putting them in a list is a good fix  but best practice is to avoid having a bunch of data frames not in a list in the first place   The other answers give plenty of detail of how to assign data frames to list elements  access them  etc  We ll cover that a little here too  but the Main Point is to say don t wait until you have a bunch of a data frames to add them to a list  Start with the list   The rest of the this answer will cover some common cases where you might be tempted to create sequential variables  and show you how to go straight to lists  If you re new to lists in R  you might want to also read What s the difference between    and   in accessing elements of a list      Lists from the start  Don t ever create d1 d2 d3       dn in the first place  Create a list d with n elements   Reading multiple files into a list of data frames  This is done pretty easily when reading in files  Maybe you ve got files data1 csv  data2 csv      in a directory  Your goal is a list of data frames called mydata  The first thing you need is a vector with all the file names  You can construct this with paste  e g   my files   paste0  data   1 5    csv     but it s probably easier to use list files to grab all the appropriate files  my files  lt - list files pattern       csv     You can use regular  expressions to match the files  read more about regular expressions in other questions if you need help there  This way you can grab all CSV files even if they don t follow a nice naming scheme  Or you can use a fancier regex pattern if you need to pick certain CSV files out from a bunch of them   At this point  most R beginners will use a for loop  and there s nothing wrong with that  it works just fine   my data  lt - list   for  i in seq along my files         my data  i    lt - read csv file   my files i       A more R-like way to do it is with lapply  which is a shortcut for the above  my data  lt - lapply my files  read csv    Of course  substitute other data import function for read csv as appropriate  readr  read csv or data table  fread will be faster  or you may also need a different function for a different file type   Either way  it s handy to name the list elements to match the files  names my data   lt - gsub     csv        my files    or  if you prefer the consistent syntax of stringr names my data   lt - stringr  str replace my files  pattern     csv   replacement         Splitting a data frame into a list of data frames  This is super-easy  the base function split   does it for you  You can split by a column  or columns  of the data  or by anything else you want  mt list   split mtcars  f   mtcars cyl    This gives a list of three data frames  one for each value of cyl   This is also a nice way to break a data frame into pieces for cross-validation  Maybe you want to split mtcars into training  test  and validation pieces   groups   sample c  train    test    validate                    size   nrow mtcars   replace   TRUE  mt split   split mtcars  f   groups    and mt split has appropriate names already    Simulating a list of data frames  Maybe you re simulating data  something like this   my sim data   data frame x   rnorm 50   y   rnorm 50     But who does only one simulation  You want to do this 100 times  1000 times  more  But you don t want 10 000 data frames in your workspace  Use replicate and put them in a list   sim list   replicate n   10                       expr    data frame x   rnorm 50   y   rnorm 50                          simplify   F    In this case especially  you should also consider whether you really need separate data frames  or would a single data frame with a  group  column work just as well  Using data table or dplyr it s quite easy to do things  by group  to a data frame   I didn t put my data in a list    I will next time  but what can I do now   If they re an odd assortment  which is unusual   you can simply assign them   mylist  lt - list   mylist  1    lt - mtcars mylist  2    lt - data frame a   rnorm 50   b   runif 50         If you have data frames named in a pattern  e g   df1  df2  df3  and you want them in a list  you can get them if you can write a regular expression to match the names  Something like  df list   mget ls pattern    df 0-9       this would match any object with  df  followed by a digit in its name   you can test what objects will be got by just running the ls pattern    df 0-9      part and adjusting the pattern until it gets the right objects    Generally  mget is used to get multiple objects and return them in a named list  Its counterpart get is used to get a single object and return it  not in a list    Combining a list of data frames into a single data frame  A common task is combining a list of data frames into one big data frame  If you want to stack them on top of each other  you would use rbind for a pair of them  but for a list of data frames here are three good choices     base option - slower but not extra dependencies big data   do call what   rbind  args   df list     data table and dplyr have nice functions for this that    - are much faster    - add id columns to identify the source    - fill in missing values if some data frames have more columns than others   see their help pages for details big data   data table  rbindlist df list  big data   dplyr  bind rows df list     Similarly using cbind or dplyr  bind cols for columns    To merge  join  a list of data frames  you can see these answers  Often  the idea is to use Reduce with merge  or some other joining function  to get them together   Why put the data in a list   Put similar data in lists because you want to do similar things to each data frame  and functions like lapply  sapply do call  the purrr package  and the old plyr l ply functions make it easy to do that  Examples of people easily doing things with lists are all over SO   Even if you use a lowly for loop  it s much easier to loop over the elements of a list than it is to construct variable names with paste and access the objects with get  Easier to debug  too   Think of scalability  If you really only need three variables  it s fine to use d1  d2  d3  But then if it turns out you really need 6  that s a lot more typing  And next time  when you need 10 or 20  you find yourself copying and pasting lines of code  maybe using find replace to change d14 to d15  and you re thinking this isn t how programming should be  If you use a list  the difference between 3 cases  30 cases  and 300 cases is at most one line of code---no change at all if your number of cases is automatically detected by  e g   how many  csv files are in your directory    You can name the elements of a list  in case you want to use something other than numeric indices to access your data frames  and you can use both  this isn t an XOR choice    Overall  using lists will lead you to write cleaner  easier-to-read code  which will result in fewer bugs and less confusion

User · Answer

This isn t related to your question  but you want to use   and not  lt - within the function call  If you use  lt -  you ll end up creating variables y1 and y2 in whatever environment you re working in   d1  lt - data frame y1  lt - c 1  2  3   y2  lt - c 4  5  6   y1    1  1 2 3 y2    1  4 5 6   This won t have the seemingly desired effect of creating column names in the data frame   d1     y1    c 1  2  3  y2    c 4  5  6    1                1                4   2                2                5   3                3                6   The   operator  on the other hand  will associate your vectors with arguments to data frame   As for your question  making a list of data frames is easy   d1  lt - data frame y1   c 1  2  3   y2   c 4  5  6   d2  lt - data frame y1   c 3  2  1   y2   c 6  5  4   my list  lt - list d1  d2    You access the data frames just like you would access any other list element   my list  1       y1 y2   1  1  4   2  2  5   3  3  6

User · Answer

I consider myself a complete newbie  but I think I have an extremely simple answer to one of the original subquestions that has not been stated here  accessing the data frames  or parts of it    Let s start by creating the list with data frames as was stated above   d1  lt - data frame y1   c 1  2  3   y2   c 4  5  6    d2  lt - data frame y1   c 3  2  1   y2   c 6  5  4    my list  lt - list d1  d2    Then  if you want to access a specific value in one of the data frames  you can do so by using the double brackets sequentially  The first set gets you into the data frame  and the second set gets you to the specific coordinates   my list  1    3 2     1  6

User · Answer

This may be a little late but going back to your example I thought I would extend the answer just a tad      D1  lt - data frame Y1 c 1 2 3   Y2 c 4 5 6    D2  lt - data frame Y1 c 3 2 1   Y2 c 6 5 4    D3  lt - data frame Y1 c 6 5 4   Y2 c 3 2 1    D4  lt - data frame Y1 c 9 9 9   Y2 c 8 8 8     Then you make your list easily   mylist  lt - list D1 D2 D3 D4    Now you have a list but instead of accessing the list the old way such as    mylist  1     to access  d1    you can use this function to obtain  amp  assign the dataframe of your choice   GETDF FROMLIST  lt - function DF LIST  ITEM LOC      DF SELECTED  lt - DF LIST  ITEM LOC      return DF SELECTED      Now get the one you want   D1  lt - GETDF FROMLIST mylist  1  D2  lt - GETDF FROMLIST mylist  2  D3  lt - GETDF FROMLIST mylist  3  D4  lt - GETDF FROMLIST mylist  4    Hope that extra bit helps   Cheers

User · Answer

Very simple   Here is my suggestion     If you want to select dataframes in your workspace  try this     Filter function x  is data frame get x     ls      or   ls   sapply ls    function x  is data frame get x       all these will give the same result    You can change is data frame to check other types of variables like is function

[r] How do I make a list of data frames?

Examples related to r

Examples related to list

Examples related to dataframe

Examples related to r-faq