How to sort a dataframe by multiple column s

Question

I want to sort a data frame by multiple columns  For example  with the data frame below I would like to sort by column z  descending  then by column b  ascending     dd  lt - data frame b   factor c  Hi    Med    Hi    Low           levels   c  Low    Med    Hi    ordered   TRUE         x   c  A    D    A    C    y   c 8  3  9  9         z   c 1  1  1  2   dd     b x y z 1  Hi A 8 1 2 Med D 3 1 3  Hi A 9 1 4 Low C 9 2

User · Answer

Another alternative  using the rgr package    gt  library rgr   gt  gx sort df dd    -z b      b x y z 4 Low C 9 2 2 Med D 3 1 1  Hi A 8 1 3  Hi A 9 1

User · Answer

Dirk s answer is great  It also highlights a key difference in the syntax used for indexing data frames and data tables      The data frame way dd with dd  order -z  b          The data table way   7 fewer characters  but that s not the important bit  dd order -z  b     The difference between the two calls is small  but it can have important consequences  Especially if you write production code and or are concerned with correctness in your research  it s best to avoid unnecessary repetition of variable names  data table  helps you do this   Here s an example of how repetition of variable names might get you into trouble   Let s change the context from Dirk s answer  and say this is part of a bigger project where there are a lot of object names and they are long and meaningful  instead of dd it s called quarterlyreport  It becomes    quarterlyreport with quarterlyreport order -z b       Ok  fine  Nothing wrong with that  Next your boss asks you to include last quarter s report in the report  You go through your code  adding an object lastquarterlyreport in various places and somehow  how on earth   you end up with this    quarterlyreport with lastquarterlyreport order -z b       That isn t what you meant but you didn t spot it because you did it fast and it s nestled on a page of similar code  The code doesn t fall over  no warning and no error  because R thinks it is what you meant  You d hope whoever reads your report spots it  but maybe they don t  If you work with programming languages a lot then this situation may be all to familiar  It was a  typo  you ll say  I ll fix the  typo  you ll say to your boss   In data table we re concerned about tiny details like this  So we ve done something simple to avoid typing variable names twice  Something very simple  i is evaluated within the frame of dd already  automatically  You don t need with   at all   Instead of  dd with dd  order -z  b        it s just  dd order -z  b     And instead of  quarterlyreport with lastquarterlyreport order -z b       it s just  quarterlyreport order -z b     It s a very small difference  but it might just save your neck one day  When weighing up the different answers to this question  consider counting the repetitions of variable names as one of your criteria in deciding   Some answers have quite a few repeats  others have none

User · Answer

The R package data table provides both fast and memory efficient ordering of data tables with a straightforward syntax  a part of which Matt has highlighted quite nicely in his answer   There has been quite a lot of improvements and also a new function setorder   since then  From v1 9 5   setorder   also works with data frames   First  we ll create a dataset big enough and benchmark the different methods mentioned from other answers and then list the features of data table   Data   require plyr  require doBy  require data table  require dplyr  require taRifx   set seed 45L  dat   data frame b   as factor sample c  Hi    Med    Low    1e8  TRUE                     x   sample c  A    D    C    1e8  TRUE                    y   sample 100  1e8  TRUE                    z   sample 5  1e8  TRUE                     stringsAsFactors   FALSE    Benchmarks   The timings reported are from running system time      on these functions shown below  The timings are tabulated below  in the order of slowest to fastest    orderBy    -z   b  data   dat         doBy plyr  arrange dat  desc z   b         plyr arrange dat  desc z   b               dplyr sort dat  f     -z   b                taRifx dat with dat  order -z  b             base R    convert to data table  by reference setDT dat   dat order -z  b                       data table  base R like syntax setorder dat  -z  b                   data table  using setorder                                         setorder   now also works with data frames     R-session memory usage  BEFORE     2GB  size of  dat     ------------------------------------------------------------   Package      function    Time  s   Peak memory   Memory used   ------------------------------------------------------------   doBy          orderBy      409 7        6 7 GB        4 7 GB   taRifx           sort      400 8        6 7 GB        4 7 GB   plyr          arrange      318 8        5 6 GB        3 6 GB    base R          order      299 0        5 6 GB        3 6 GB   dplyr         arrange       62 7        4 2 GB        2 2 GB   ------------------------------------------------------------   data table      order        6 2        4 2 GB        2 2 GB   data table   setorder        4 5        2 4 GB        0 4 GB   ------------------------------------------------------------    data table s DT order       syntax was  10x faster than the fastest of other methods  dplyr   while consuming the same amount of memory as dplyr  data table s setorder   was  14x faster than the fastest of other methods  dplyr   while taking just 0 4GB extra memory  dat is now in the order we require  as it is updated by reference     data table features   Speed    data table s ordering is extremely fast because it implements radix ordering   The syntax DT order       is optimised internally to use data table s fast ordering as well  You can keep using the familiar base R syntax but speed up the process  and use less memory     Memory    Most of the times  we don t require the original data frame or data table after reordering  That is  we usually assign the result back to the same object  for example   DF  lt - DF order         The issue is that this requires at least twice  2x  the memory of the original object  To be memory efficient  data table therefore also provides a function setorder     setorder   reorders data tables by reference  in-place   without making any additional copies  It only uses extra memory equal to the size of one column    Other features    It supports integer  logical  numeric  character and even bit64  integer64 types      Note that factor  Date  POSIXct etc   classes are all integer numeric types underneath with additional attributes and are therefore supported as well    In base R  we can not use - on a character vector to sort by that column in decreasing order  Instead we have to use -xtfrm       However  in data table  we can just do  for example  dat order -x   or setorder dat  -x

User · Answer

Alternatively  using the package Deducer  library Deducer  dd lt - sortData dd c  z   b   increasing  c FALSE TRUE

User · Answer

Your choices  order from base arrange from dplyr setorder and setorderv from data table arrange from plyr sort from taRifx orderBy from doBy sortData from Deducer  Most of the time you should use the dplyr or data table solutions  unless having no-dependencies is important  in which case use base  order   I recently added sort data frame to a CRAN package  making it class compatible as discussed here  Best way to create generic method consistency for sort data frame  Therefore  given the data frame dd  you can sort as follows  dd  lt - data frame b   factor c  quot Hi quot    quot Med quot    quot Hi quot    quot Low quot           levels   c  quot Low quot    quot Med quot    quot Hi quot    ordered   TRUE         x   c  quot A quot    quot D quot    quot A quot    quot C quot    y   c 8  3  9  9         z   c 1  1  1  2   library taRifx  sort dd  f    -z   b    If you are one of the original authors of this function  please contact me   Discussion as to public domaininess is here  https   chat stackoverflow com transcript message 1094290 1094290  You can also use the arrange   function from plyr as Hadley pointed out in the above thread  library plyr  arrange dd desc z  b    Benchmarks  Note that I loaded each package in a new R session since there were a lot of conflicts   In particular loading the doBy package causes sort to return  quot The following object s  are masked from  x  position 17    b  x  y  z quot   and loading the Deducer package overwrites sort data frame from Kevin Wright or the taRifx package   Load each time dd  lt - data frame b   factor c  quot Hi quot    quot Med quot    quot Hi quot    quot Low quot           levels   c  quot Low quot    quot Med quot    quot Hi quot    ordered   TRUE         x   c  quot A quot    quot D quot    quot A quot    quot C quot    y   c 8  3  9  9         z   c 1  1  1  2   library microbenchmark     Reload R between benchmarks microbenchmark dd with dd  order -z  b            dd order -dd z  dd b         times 1000    Median times  dd with dd  order -z  b      778 dd order -dd z  dd b    788 library taRifx  microbenchmark sort dd  f   -z b   times 1000   Median time  1 567 library plyr  microbenchmark arrange dd desc z  b  times 1000   Median time  862 library doBy  microbenchmark orderBy  -z b  data dd  times 1000   Median time  1 694 Note that doBy takes a good bit of time to load the package  library Deducer  microbenchmark sortData dd c  quot z quot   quot b quot   increasing  c FALSE TRUE   times 1000   Couldn t make Deducer load   Needs JGR console  esort  lt - function x  sortvar         attach x  x  lt - x with x order sortvar         return x  detach x     microbenchmark esort dd  -z  b  times 1000   Doesn t appear to be compatible with microbenchmark due to the attach detach   m  lt - microbenchmark    arrange dd desc z  b     sort dd  f   -z b      dd with dd  order -z  b          dd order -dd z  dd b       times 1000      uq  lt - function x    fivenum x  4     lq  lt - function x    fivenum x  2    y min  lt - 0   min by m time m expr lq   y max  lt - max by m time m expr uq     1 05    p  lt - ggplot m aes x expr y time     coord cartesian ylim   c  y min   y max     p   stat summary fun y median fun ymin   lq  fun ymax   uq  aes fill expr      lines extend from lower quartile to upper quartile  dot is the median   Given these results and weighing simplicity vs  speed  I d have to give the nod to arrange in the plyr package   It has a simple syntax and yet is almost as speedy as the base R commands with their convoluted machinations   Typically brilliant Hadley Wickham work   My only gripe with it is that it breaks the standard R nomenclature where sorting objects get called by sort object   but I understand why Hadley did it that way due to issues discussed in the question linked above

User · Answer

or you can use package doBy  library doBy  dd  lt - orderBy  -z b  data dd

User · Answer

Dirk s answer is good but if you need the sort to persist you ll want to apply the sort back onto the name of that data frame   Using the example code   dd  lt - dd with dd  order -z  b

User · Answer

I learned about order with the following example which then confused me for a long time   set seed 1234   ID          1 10 Age         round rnorm 10  50  1   diag        c  Depression    Bipolar   Diagnosis   sample diag  10  replace TRUE   data   data frame ID  Age  Diagnosis   databyAge   data order Age    databyAge   The only reason this example works is because order is sorting by the vector Age  not by the column named Age in the data frame data   To see this create an identical data frame using read table with slightly different column names and without making use of any of the above vectors   my data  lt - read table text        id age  diagnosis    1  49 Depression    2  50 Depression    3  51 Depression    4  48 Depression    5  50 Depression    6  51    Bipolar    7  49    Bipolar    8  49    Bipolar    9  49    Bipolar   10  49 Depression     header   TRUE    The above line structure for order no longer works because there is no vector named age   databyage   my data order age      The following line works because order sorts on the column age in my data   databyage   my data order my data age      I thought this was worth posting given how confused I was by this example for so long   If this post is not deemed appropriate for the thread I can remove it   EDIT  May 13  2014  Below is a generalized way of sorting a data frame by every column without specifying column names   The code below shows how to sort from left to right or by right to left   This works if every column is numeric   I have not tried with a character column added     I found the do call code a month or two ago in an old post on a different site  but only after extensive and difficult searching   I am not sure I could relocate that post now   The present thread is the first hit for ordering a data frame in R   So  I thought my expanded version of that original do call code might be useful   set seed 1234   v1   lt - c 0 0 0 0  0 0 0 0  1 1 1 1  1 1 1 1  v2   lt - c 0 0 0 0  1 1 1 1  0 0 0 0  1 1 1 1  v3   lt - c 0 0 1 1  0 0 1 1  0 0 1 1  0 0 1 1  v4   lt - c 0 1 0 1  0 1 0 1  0 1 0 1  0 1 0 1   df 1  lt - data frame v1  v2  v3  v4   df 1  rdf 1  lt - df 1 sample nrow df 1   nrow df 1   replace   FALSE    rdf 1  order rdf 1  lt - rdf 1 do call order  as list rdf 1     order rdf 1  order rdf 2  lt - rdf 1 do call order  rev as list rdf 1      order rdf 2  rdf 3  lt - data frame rdf 1 v2  rdf 1 v4  rdf 1 v3  rdf 1 v1   rdf 3  order rdf 3  lt - rdf 1 do call order  as list rdf 3     order rdf 3

User · Answer

Just like the mechanical card sorters of long ago  first sort by the least significant key  then the next most significant  etc   No library required  works with any number of keys and any combination of ascending and descending keys    dd  lt - dd order dd b  decreasing   FALSE      Now we re ready to do the most significant key   The sort is stable  and any ties in the most significant key have already been resolved   dd  lt - dd order dd z  decreasing   TRUE      This may not be the fastest  but it is certainly simple and reliable

User · Answer

Suppose you have a data frame A and you want to sort it using column called x descending order  Call the sorted data frame newdata  newdata  lt - A order -A x      If you want ascending order then replace  -  with nothing  You can have something like   newdata  lt - A order -A x  A y  -A z      where x and z are some columns in data frame A  This means sort data frame A by x descending  y ascending and z descending

User · Answer

The arrange   in dplyr is my favorite option  Use the pipe operator and go from least important to most important aspect dd1  lt - dd   gt       arrange z    gt       arrange desc x

User · Answer

if SQL comes naturally to you  sqldf package handles ORDER BY as Codd intended

User · Answer

With this  very helpful  function by Kevin Wright  posted in the tips section of the R wiki  this is easily achieved   sort dd by     -z   b        b x y z   4 Low C 9 2   2 Med D 3 1   1  Hi A 8 1   3  Hi A 9 1

User · Answer

You can use the order   function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example order  code   R gt  dd with dd  order -z  b          b x y z 4 Low C 9 2 2 Med D 3 1 1  Hi A 8 1 3  Hi A 9 1   Edit some 2  years later   It was just asked how to do this by column index  The answer is to simply pass the desired sorting column s  to the order   function   R gt  dd order -dd  4   dd  1          b x y z 4 Low C 9 2 2 Med D 3 1 1  Hi A 8 1 3  Hi A 9 1 R gt     rather than using the name of the column  and with   for easier more direct access

User · Answer

For the sake of completeness  you can also use the sortByCol   function from the BBmisc package   library BBmisc  sortByCol dd  c  z    b    asc   c FALSE  TRUE       b x y z 4 Low C 9 2 2 Med D 3 1 1  Hi A 8 1 3  Hi A 9 1   Performance comparison   library microbenchmark  microbenchmark sortByCol dd  c  z    b    asc   c FALSE  TRUE    times   100000  median 202 878  library plyr  microbenchmark arrange dd desc z  b  times 100000  median 148 758  microbenchmark dd with dd  order -z  b       times   100000  median 115 872

User · Answer

I was struggling with the above solutions when I wanted to automate my ordering process for n columns  whose column names could be different each time  I found a super helpful function from the psych package to do this in a straightforward manner   dfOrder myDf  columnIndices    where columnIndices are indices of one or more columns  in the order in which you want to sort them  More information here   dfOrder function from  psych  package

User · Answer

Just for the sake of completeness  since not much has been said about sorting by column numbers    It can surely be argued that it is often not desirable  because the order of the columns could change  paving the way to errors   but in some specific situations  when for instance you need a quick job done and there is no such risk of columns changing orders   it might be the most sensible thing to do  especially when dealing with large numbers of columns   In that case  do call   comes to the rescue   ind  lt - do call what    order   args   iris  c 5 1 2 3    iris ind               Sepal Length Sepal Width Petal Length Petal Width    Species       14           4 3         3 0          1 1         0 1     setosa       9            4 4         2 9          1 4         0 2     setosa       39           4 4         3 0          1 3         0 2     setosa       43           4 4         3 2          1 3         0 2     setosa       42           4 5         2 3          1 3         0 3     setosa       4            4 6         3 1          1 5         0 2     setosa       48           4 6         3 2          1 4         0 2     setosa       7            4 6         3 4          1 4         0 3     setosa

User · Answer

In response to a comment added in the OP for how to sort programmatically   Using dplyr and data table  library dplyr  library data table    dplyr  Just use arrange   which is the Standard Evaluation version for arrange   df1  lt - tbl df iris   using strings or formula arrange  df1  c  Petal Length    Petal Width    arrange  df1   Petal Length   Petal Width      Source  local data frame  150 x 5      Sepal Length Sepal Width Petal Length Petal Width Species            dbl         dbl          dbl         dbl    fctr  1           4 6         3 6          1 0         0 2  setosa 2           4 3         3 0          1 1         0 1  setosa 3           5 8         4 0          1 2         0 2  setosa 4           5 0         3 2          1 2         0 2  setosa 5           4 7         3 2          1 3         0 2  setosa 6           5 4         3 9          1 3         0 4  setosa 7           5 5         3 5          1 3         0 2  setosa 8           4 4         3 0          1 3         0 2  setosa 9           5 0         3 5          1 3         0 3  setosa 10          4 5         2 3          1 3         0 3  setosa                                                                 Or using a variable sortBy  lt - c  Petal Length    Petal Width   arrange  df1   dots   sortBy      Source  local data frame  150 x 5      Sepal Length Sepal Width Petal Length Petal Width Species            dbl         dbl          dbl         dbl    fctr  1           4 6         3 6          1 0         0 2  setosa 2           4 3         3 0          1 1         0 1  setosa 3           5 8         4 0          1 2         0 2  setosa 4           5 0         3 2          1 2         0 2  setosa 5           4 7         3 2          1 3         0 2  setosa 6           5 5         3 5          1 3         0 2  setosa 7           4 4         3 0          1 3         0 2  setosa 8           4 4         3 2          1 3         0 2  setosa 9           5 0         3 5          1 3         0 3  setosa 10          4 5         2 3          1 3         0 3  setosa                                                                Doing the same operation except sorting Petal Length in descending order sortByDesc  lt - c  desc Petal Length     Petal Width   arrange  df1   dots   sortByDesc    more info here  https   cran r-project org web packages dplyr vignettes nse html  It is better to use formula as it also captures the environment to evaluate an expression in  data table  dt1  lt - data table iris   not really required  as you can work directly on your data frame sortBy  lt - c  Petal Length    Petal Width   sortType  lt - c -1  1  setorderv dt1  sortBy  sortType  dt1      Sepal Length Sepal Width Petal Length Petal Width   Species   1           7 7         2 6          6 9         2 3 virginica   2           7 7         2 8          6 7         2 0 virginica   3           7 7         3 8          6 7         2 2 virginica   4           7 6         3 0          6 6         2 1 virginica   5           7 9         3 8          6 4         2 0 virginica  ---                                                             146           5 4         3 9          1 3         0 4    setosa 147           5 8         4 0          1 2         0 2    setosa 148           5 0         3 2          1 2         0 2    setosa 149           4 3         3 0          1 1         0 1    setosa 150           4 6         3 6          1 0         0 2    setosa

User · Answer

There are a lot of excellent answers here  but dplyr gives the only syntax that I can quickly and easily remember  and so now use very often    library dplyr    sort mtcars by mpg  ascending    use desc mpg  for descending arrange mtcars  mpg    sort mtcars first by mpg  then by cyl  then by wt  arrange mtcars   mpg  cyl  wt    For the OP s problem   arrange dd  desc z    b       b x y z 1 Low C 9 2 2 Med D 3 1 3  Hi A 8 1 4  Hi A 9 1

[r] How to sort a dataframe by multiple column(s)

Examples related to r

Examples related to sorting

Examples related to dataframe

Examples related to r-faq