Split a vector into chunks

Question

I have to split a vector into n chunks of equal size in R  I couldn t find any base function to do that  Also Google didn t get me anywhere  Here is what I came up with so far  x  lt - 1 10 n  lt - 3 chunk  lt - function x n  split x  factor sort rank x   n    chunk x n    0   1  1 2 3    1   1  4 5 6 7    2   1   8  9 10

User · Answer

If you don t like split   and you don t mind NAs padding out your short tail   chunk  lt - function x  n    if  length x   n   0   return matrix x  nrow n    else  return matrix append x  rep NA  n- length x   n     nrow n        The columns of the returned matrix    1 ncol   are the droids you are looking for

User · Answer

Yet another possibility is the splitIndices function from package parallel   library parallel  splitIndices 20  3    Gives     1    1  1 2 3 4 5 6 7    2    1   8  9 10 11 12 13    3    1  14 15 16 17 18 19 20

User · Answer

I have come up with this solution  require magrittr  create chunks  lt - function x  elements per chunk         plain R version       split x  rep seq along x   each   elements per chunk  seq along x          magrittr version - because that s what people use now     x   gt   seq along   gt   rep    each   elements per chunk    gt   extract seq along x     gt   split x        create chunks letters 1 10   3    1   1   quot a quot   quot b quot   quot c quot     2   1   quot d quot   quot e quot   quot f quot     3   1   quot g quot   quot h quot   quot i quot     4   1   quot j quot   The key is to use the seq each   chunk size  parameter so make it work  Using seq along acts like rank x  in my previous solution  but is actually able to produce the correct result with duplicated entries

User · Answer

Simple function for splitting a vector by simply using indexes - no need to over complicate this  vsplit  lt - function v  n        l   length v      r   l n     return lapply 1 n  function i            s   max 1  round r  i-1   1          e   min l  round r i           return v s e

User · Answer

A one-liner splitting d into chunks of size 20   split d  ceiling seq along d  20     More details  I think all you need is seq along    split   and ceiling      gt  d  lt - rpois 73 5   gt  d   1   3  1 11  4  1  2  3  2  4 10 10  2  7  4  6  6  2  1  1  2  3  8  3 10  7  4  27   3  4  4  1  1  7  2  4  6  0  5  7  4  6  8  4  7 12  4  6  8  4  2  7  6  5  53   4  5  4  5  5  8  7  7  7  6  2  4  3  3  8 11  6  6  1  8  4  gt  max  lt - 20  gt  x  lt - seq along d   gt  d1  lt - split d  ceiling x max    gt  d1   1    1   3  1 11  4  1  2  3  2  4 10 10  2  7  4  6  6  2  1  1  2    2    1   3  8  3 10  7  4  3  4  4  1  1  7  2  4  6  0  5  7  4  6    3    1   8  4  7 12  4  6  8  4  2  7  6  5  4  5  4  5  5  8  7  7    4    1   7  6  2  4  3  3  8 11  6  6  1  8  4

User · Answer

split x matrix 1 n n length x   1 length x     perhaps this is more clear  but the same idea  split x rep 1 n  ceiling length x  n  length out   length x     if you want it ordered throw a sort around it

User · Answer

This will split it differently to what you have  but is still quite a nice list structure I think  chunk 2  lt - function x  n  force number of groups   TRUE  len   length x   groups   trunc len n   overflow   len  n       if force number of groups        f1  lt - as character sort rep 1 n  groups        f  lt - as character c f1  rep n  overflow        else       f1  lt - as character sort rep 1 groups  n        f  lt - as character c f1  rep  quot overflow quot   overflow             g  lt - split x  f       if force number of groups        g names  lt - names g      g names ordered  lt - as character sort as numeric g names        else       g names  lt - names g -length g        g names ordered  lt - as character sort as numeric g names        g names ordered  lt - c g names ordered   quot overflow quot            return g g names ordered      Which will give you the following  depending on how you want it formatted   gt  x  lt - 1 10  n  lt - 3  gt  chunk 2 x  n  force number of groups   FALSE    1   1  1 2 3    2   1  4 5 6    3   1  7 8 9   overflow  1  10   gt  chunk 2 x  n  force number of groups   TRUE    1   1  1 2 3    2   1  4 5 6    3   1   7  8  9 10  Running a couple of timings using these settings  set seed 42  x  lt - rnorm 1 1e7  n  lt - 3  Then we have the following results   gt  system time chunk x  n     your function     user  system elapsed   29 500   0 620  30 125    gt  system time chunk 2 x  n  force number of groups   TRUE      user  system elapsed    5 360   0 300   5 663   Note  Changing as factor   to as character   made my function twice as fast

User · Answer

I need a function that takes the argument of a data table  in quotes  and another argument that is the upper limit on the number of rows in the subsets of that original data table  This function produces whatever number of data tables that upper limit allows for   library data table      split dt  lt - function x y             for i in seq from 1 to nrow get x   by y             df   lt  lt - get x  i  i   y                assign paste0  df   i  df  inherits TRUE       rm df  inherits TRUE          This function gives me a series of data tables named df  number  with the starting row from the original data table in the name  The last data table can be short and filled with NAs so you have to subset that back to whatever data is left  This type of function is useful because certain GIS software have limits on how many address pins you can import  for example  So slicing up data tables into smaller chunks may not be recommended  but it may not be avoidable

User · Answer

You could combine the split cut  as suggested by mdsummer  with quantile to create even groups   split x cut x quantile x  0 n  n   include lowest TRUE  labels FALSE     This gives the same result for your example  but not for skewed variables

User · Answer

If you don t like split   and you don t like matrix    with its dangling NAs   there s this   chunk  lt - function x  n   mapply function a  b   x a b    seq int from 1  to length x   by n   pmin seq int from 1  to length x   by n   n-1   length x    SIMPLIFY FALSE     Like split    it returns a list  but it doesn t waste time or space with labels  so it may be more performant

User · Answer

simplified version    n   3 split x  sort x  n

User · Answer

This splits into chunks of size  n k  1 or  n k  and does not use the O n log n  sort   get chunk id lt -function n  k       r  lt - n    k     s  lt - n     k     i lt -seq len n      1   ifelse  i  lt   r    s 1    i-1       s 1   r     i - r    s 1 -1      s      split 1 10  get chunk id 10 3

User · Answer

Credit to  Sebastian for this function  chunk  lt - function x y            split x  factor sort rank row names x    y

User · Answer

A few more variants to the pile      gt  x  lt - 1 10  gt  n  lt - 3   Note  that you don t need to use the factor function here  but you still want to sort o w your first vector would be 1 2 3 10    gt  chunk  lt - function x  n  split x  sort rank x     n    gt  chunk x n    0   1  1 2 3   1   1  4 5 6 7   2   1   8  9 10   Or you can assign character indices  vice the numbers in left ticks above    gt  my chunk  lt - function x  n  split x  sort rep letters 1 n   each n  len length x      gt  my chunk x  n   a  1  1 2 3 4  b  1  5 6 7  c  1   8  9 10   Or you can use plainword names stored in a vector  Note that using sort to get consecutive values in x alphabetizes the labels    gt  my other chunk  lt - function x  n  split x  sort rep c  tom    dick    harry    each n  len length x      gt  my other chunk x  n   dick  1  1 2 3  harry  1  4 5 6  tom  1   7  8  9 10

User · Answer

I needed the same function and have read the previous solutions  however i also needed to have the unbalanced chunk to be at the end i e if i have 10 elements to split them into vectors of 3 each  then my result should have vectors with 3 3 4 elements respectively  So i used the following  i left the code unoptimised for readability  otherwise no need to have many variables    chunk  lt - function x n     numOfVectors  lt - floor length x  n    elementsPerVector  lt - c rep n numOfVectors-1  n length x     n    elemDistPerVector  lt - rep 1 numOfVectors elementsPerVector    split x factor elemDistPerVector     set seed 1  x  lt - rnorm 10  n  lt - 3 chunk x n    1   1  -0 6264538  0 1836433 -0 8356286    2   1   1 5952808  0 3295078 -0 8204684    3   1   0 4874291  0 7383247  0 5757814 -0 3053884

User · Answer

Try the ggplot2 function  cut number   library ggplot2  x  lt - 1 10 n  lt - 3 cut number x  n    labels   FALSE if you just want an integer result   gt    1   1 4    1 4    1 4    1 4    4 7    4 7    4 7    7 10   7 10   7 10    gt  Levels   1 4   4 7   7 10     if you want it split into a list  split x  cut number x  n     gt     1 4     gt   1  1 2 3 4   gt     gt     4 7     gt   1  5 6 7   gt     gt     7 10     gt   1   8  9 10

User · Answer

Using base R s rep len   x  lt - 1 10 n  lt - 3  split x  rep len 1 n  length x        1     1   1  4  7 10        2     1  2 5 8        3     1  3 6 9   And as already mentioned if you want sorted indices  simply   split x  sort rep len 1 n  length x         1     1  1 2 3 4        2     1  5 6 7        3     1   8  9 10

User · Answer

Here s another variant    NOTE  with this sample you re specifying the CHUNK SIZE in the second parameter   all chunks are uniform  except for the last  the last will at worst be smaller  never bigger than the chunk size      chunk  lt - function x n        f  lt - sort rep 1  trunc length x  n  1  n   1 length x       return split x f       Test n lt -c 1 2 3 4 5 6 7 8 9 10 11   c lt -chunk n 5   q lt -lapply c  function r  cat r sep     collapse         output 1 2 3 4 5  6 7 8 9 10  11

User · Answer

Sorry if this answer comes so late  but maybe it can be useful for someone else  Actually there is a very useful solution to this problem  explained at the end of  split    gt  testVector  lt - c 1 10   I want to divide it into 5 parts  gt  VectorList  lt - split testVector  1 5   gt  VectorList   1   1  1 6    2   1  2 7    3   1  3 8    4   1  4 9    5   1   5 10

User · Answer

chunk2  lt - function x n  split x  cut seq along x   n  labels   FALSE

[r] Split a vector into chunks

Examples related to r

Examples related to vector