How to split a data frame

Question

I want to split a data frame into several smaller ones  This looks like a very trivial question  however I cannot find a solution from web search

User · Answer

The answer you want depends very much on how and why you want to break up the data frame.

For example, if you want to leave out some variables, you can create new data frames from specific columns of the database. The subscripts in brackets after the data frame refer to row and column numbers. Check out Spoetry for a complete description.

newdf <- mydf[,1:3]

Or, you can choose specific rows.

newdf <- mydf[1:3,]

And these subscripts can also be logical tests, such as choosing rows that contain a particular value, or factors with a desired value.

What do you want to do with the chunks left over? Do you need to perform the same operation on each chunk of the database? Then you'll want to ensure that the subsets of the data frame end up in a convenient object, such as a list, that will help you perform the same command on each chunk of the data frame.

User · Answer

subset   is also useful   subset DATAFRAME  COLUMNNAME          For a survey package  maybe the survey package is pertinent   http   faculty washington edu tlumley survey

User · Answer

You may also want to cut the data frame into an arbitrary number of smaller dataframes  Here  we cut into two dataframes   x   data frame num   1 26  let   letters  LET   LETTERS  set seed 10  split x  sample rep 1 2  13      gives    1     num let LET 3    3   c   C 6    6   f   F 10  10   j   J 12  12   l   L 14  14   n   N 15  15   o   O 17  17   q   Q 18  18   r   R 20  20   t   T 21  21   u   U 22  22   v   V 23  23   w   W 26  26   z   Z    2     num let LET 1    1   a   A 2    2   b   B 4    4   d   D 5    5   e   E 7    7   g   G 8    8   h   H 9    9   i   I 11  11   k   K 13  13   m   M 16  16   p   P 19  19   s   S 24  24   x   X 25  25   y   Y   You can also split a data frame based upon an existing column  For example  to create three data frames based on the cyl column in mtcars   split mtcars mtcars cyl

User · Answer

I just posted a kind of a RFC that might help you  Split a vector into chunks in R  x   data frame num   1 26  let   letters  LET   LETTERS     number of chunks n  lt - 2 dfchunk  lt - split x  factor sort rank row names x    n    dfchunk   0     num let LET 1    1   a   A 2    2   b   B 3    3   c   C 4    4   d   D 5    5   e   E 6    6   f   F 7    7   g   G 8    8   h   H 9    9   i   I 10  10   j   J 11  11   k   K 12  12   l   L 13  13   m   M    1     num let LET 14  14   n   N 15  15   o   O 16  16   p   P 17  17   q   Q 18  18   r   R 19  19   s   S 20  20   t   T 21  21   u   U 22  22   v   V 23  23   w   W 24  24   x   X 25  25   y   Y 26  26   z   Z   Cheers   Sebastian

User · Answer

If you want to split a dataframe according to values of some variable  I d suggest using daply   from the plyr package   library plyr  x  lt - daply df    splitting variable   function x return x     Now  x is an array of dataframes  To access one of the dataframes  you can index it with the name of the level of the splitting variable   x Level1  or x   Level1      I d be sure that there aren t other more clever ways to deal with your data before splitting it up into many dataframes though

User · Answer

Splitting the data frame seems counter-productive  Instead  use the split-apply-combine paradigm  e g   generate some data  df   data frame grp sample letters  100  TRUE   x rnorm 100     then split only the relevant columns and apply the scale   function to x in each group  and combine the results  using split lt - or ave   df z   0 split df z  df grp    lapply split df x  df grp   scale     alternative  df z   ave df x  df grp  FUN scale    This will be very fast compared to splitting data frames  and the result remains usable in downstream analysis without iteration  I think the dplyr syntax is  library dplyr  df   gt   group by grp    gt   mutate z scale x     In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine

User · Answer

You could also use  data2  lt - data data sum points    2500      This will make a dataframe with the values where sum points   2500  It gives    airfoils sum points field points   init t contour t   field t     491        5       2500         5625 0 000086  0 004272  6 321774 498        5       2500         5625 0 000087  0 004507  6 325083 504        5       2500         5625 0 000088  0 004370  6 336034 603        5        250        10000 0 000072  0 000525  1 111278 577        5        250        10000 0 000104  0 000559  1 111431 587        5        250        10000 0 000072  0 000528  1 111524 606        5        250        10000 0 000079  0 000538  1 111685       gt  data2  lt - data data sum points    2500     gt  data2 airfoils sum points field points   init t contour t   field t 108        5       2500          625 0 000082  0 004329  0 733109 106        5       2500          625 0 000102  0 004564  0 733243 117        5       2500          625 0 000087  0 004321  0 733274 112        5       2500          625 0 000081  0 004428  0 733587

User · Answer

If you want to split by values in one of the columns  you can use lapply  For instance  to split ChickWeight into a separate dataset for each chick   data ChickWeight  lapply unique ChickWeight Chick   function x  ChickWeight ChickWeight Chick    x

[r] How to split a data frame?

Examples related to r

Examples related to split

Examples related to dataframe

Examples related to r-faq