Sample random rows in dataframe

Question

I am struggling to find the appropriate function that would return a specified number of rows picked up randomly without replacement from a data frame in R language  Can anyone help me out

User · Answer

You could do this   sample data   data sample nrow data   sample size  replace   FALSE

User · Answer

Select a Random sample from a tibble type in R   library  tibble       a  lt - your tibble sample 1 nrow your tibble   150      nrow takes a tibble and returns the number of rows   The first parameter passed to sample is a range from 1 to the end of your tibble   The second parameter passed to sample  150  is how many random samplings you want   The square bracket slicing specifies the rows of the indices returned   Variable  a  gets the value of the random sampling

User · Answer

First make some data    gt  df   data frame matrix rnorm 20   nrow 10    gt  df            X1         X2 1   0 7091409 -1 4061361 2  -1 1334614 -0 1973846 3   2 3343391 -0 4385071 4  -0 9040278 -0 6593677 5   0 4180331 -1 2592415 6   0 7572246 -0 5463655 7  -0 8996483  0 4231117 8  -1 0356774 -0 1640883 9  -0 3983045  0 7157506 10 -0 9060305  2 3234110   Then select some rows at random    gt  df sample nrow df   3                X1         X2 9  -0 3983045  0 7157506 2  -1 1334614 -0 1973846 10 -0 9060305  2 3234110

User · Answer

EDIT  This answer is now outdated  see the updated version   In my R package I have enhanced sample so that it now behaves as expected also for data frames   library devtools   install github  kimisc    krlmlr    library kimisc  example sample data frame   smpl   gt  set seed 42   smpl   gt  sample data frame a c 1 2 3   b c 4 5 6                              row names c  a    b    c     10  replace TRUE      a b c   3 6 c 1 3 6 a   1 4 c 2 3 6 b   2 5 b 1 2 5 c 3 3 6 a 1 1 4 b 2 2 5 c 4 3 6   This is achieved by making sample an S3 generic method and providing the necessary  trivial  functionality in a function  A call to setMethod fixes everything  The original implementation still can be accessed through base  sample

User · Answer

Outdated answer  Please use dplyr  sample frac   or dplyr  sample n   instead    In my R package there is a function sample rows just for this purpose   install packages  kimisc    library kimisc  example sample rows   smpl   gt  set seed 42   smpl   gt  sample rows data frame a c 1 2 3   b c 4 5 6                                  row names c  a    b    c     10  replace TRUE      a b c   3 6 c 1 3 6 a   1 4 c 2 3 6 b   2 5 b 1 2 5 c 3 3 6 a 1 1 4 b 2 2 5 c 4 3 6   Enhancing sample by making it a generic S3 function was a bad idea  according to comments by Joris Meys to a previous answer

User · Answer

Write one  Wrapping JC s answer gives me   randomRows   function df n      return df sample nrow df  n         Now make it better by checking first if n lt  nrow df  and stopping with an error

User · Answer

Just for completeness sake   dplyr also offers to draw a proportion or fraction of the sample by  df   gt   sample frac 0 33    This is very convenient e g  in machine learning when you have to do a certain split ratio like 80  20

User · Answer

The answer John Colby gives is the right answer   However if you are a dplyr user there is also the answer sample n   sample n df  10    randomly samples 10 rows from the dataframe   It calls sample int  so really is the same answer with less typing  and simplifies use in the context of magrittr since the dataframe is the first argument

User · Answer

I m new in R  but I was using this easy method that works for me   sample of diamonds  lt - diamonds sample nrow diamonds  100      PS  Feel free to note if it has some drawback I m not thinking about

User · Answer

The data table package provides the function DT sample  N  M    sampling M random rows from the data table DT   library data table  set seed 10   mtcars  lt - data table mtcars  mtcars sample  N  6        mpg cyl  disp  hp drat    wt  qsec vs am gear carb 1  14 7   8 440 0 230 3 23 5 345 17 42  0  0    3    4 2  19 2   6 167 6 123 3 92 3 440 18 30  1  0    4    4 3  17 3   8 275 8 180 3 07 3 730 17 60  0  0    3    3 4  21 5   4 120 1  97 3 70 2 465 20 01  1  0    3    1 5  22 8   4 108 0  93 3 85 2 320 18 61  1  1    4    1 6  15 5   8 318 0 150 2 76 3 520 16 87  0  0    3    2

User · Answer

You could do this   library dplyr   cols  lt - paste0  a   1 10  tab  lt - matrix 1 1000  nrow   100    gt   as tibble     gt   set names cols  tab   A tibble  100 x 10       a1    a2    a3    a4    a5    a6    a7    a8    a9   a10     lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   1     1   101   201   301   401   501   601   701   801   901  2     2   102   202   302   402   502   602   702   802   902  3     3   103   203   303   403   503   603   703   803   903  4     4   104   204   304   404   504   604   704   804   904  5     5   105   205   305   405   505   605   705   805   905  6     6   106   206   306   406   506   606   706   806   906  7     7   107   207   307   407   507   607   707   807   907  8     8   108   208   308   408   508   608   708   808   908  9     9   109   209   309   409   509   609   709   809   909 10    10   110   210   310   410   510   610   710   810   910       with 90 more rows   Above I just made a dataframe with 10 columns and 100 rows  ok   Now you can sample it with sample n   sample n tab  size   800  replace   T    A tibble  800 x 10       a1    a2    a3    a4    a5    a6    a7    a8    a9   a10     lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   lt int gt   1    53   153   253   353   453   553   653   753   853   953  2    14   114   214   314   414   514   614   714   814   914  3    10   110   210   310   410   510   610   710   810   910  4    70   170   270   370   470   570   670   770   870   970  5    36   136   236   336   436   536   636   736   836   936  6    77   177   277   377   477   577   677   777   877   977  7    13   113   213   313   413   513   613   713   813   913  8    58   158   258   358   458   558   658   758   858   958  9    29   129   229   329   429   529   629   729   829   929 10     3   103   203   303   403   503   603   703   803   903       with 790 more rows

[r] Sample random rows in dataframe

Examples related to r

Examples related to dataframe

Examples related to random

Examples related to r-faq