Understanding the order function

Question

I m trying to understand how the order   function works   I was under the impression that it returned a permutation of indices  which when sorted  would sort the original vector   For instance    gt  a  lt - c 45 50 10 96   gt  order a   1  3 1 2 4   I would have expected this to return c 2  3  1  4   since the list sorted would be 10 45 50 96   Can someone help me understand the return value of this function

User · Accepted Answer

This seems to explain it      The definition of order is that a order a   is in   increasing order  This works with your example  where the correct   order is the fourth  second  first  then third element       You may have been looking for rank  which returns the rank of the   elements   R gt  a  lt - c 4 1  3 2  6 1  3 1    R gt  order a     1  4 2 1 3   R gt  rank a     1  3 2 4 1   so rank tells you what order the numbers are in    order tells you how to get them in ascending order       plot a  rank a  length a   will give a graph of the CDF   To see why   order is useful  though  try plot a  rank a  length a  type  S     which gives a mess  because the data are not in increasing order      If you did   oo lt -order a    plot a oo  rank a oo   length a  type  S     or simply   oo lt -order a    plot a oo   1 length a   length a   type  S     you get a line graph of the CDF    I ll bet you re thinking of rank

User · Answer

they are similar but not same  set seed 0  x lt -matrix rnorm 10  1     one can compute from the other rank x      col x    diag length x   order x    order x     col x    diag length x   rank x      rank can be used to sort sort x     x   diag length x   rank x

User · Answer

Running this little piece of code allowed me to understand the order function  x  lt - c 3  22  5  1  77   cbind    index 1 length x     rank rank x     x     order order x      sort sort x          index rank  x order sort  1       1    2  3     4    1  2       2    4 22     1    3  3       3    3  5     3    5  4       4    1  1     2   22  5       5    5 77     5   77   Reference  http   r 789695 n4 nabble com I-don-t-understand-the-order-function-td4664384 html

User · Answer

To sort a 1D vector or a single column of data  just call the sort function and pass in your sequence   On the other hand  the order function is necessary to sort data two-dimensional data--i e   multiple columns of data collected in a matrix or dataframe   Stadium Home Week Qtr Away Off Def Result       Kicker Dist 751     Out  PHI   14   4  NYG PHI NYG   Good      D Akers   50 491     Out   KC    9   1  OAK OAK  KC   Good S Janikowski   32 702     Out  OAK   15   4  CLE CLE OAK   Good     P Dawson   37 571     Out   NE    1   2  OAK OAK  NE Missed S Janikowski   43 654     Out  NYG   11   2  PHI NYG PHI   Good      J Feely   26 307     Out  DEN   14   2  BAL DEN BAL   Good       J Elam   48 492     Out   KC   13   3  DEN  KC DEN   Good      L Tynes   34 691     Out  NYJ   17   3  BUF NYJ BUF   Good     M Nugent   25 164     Out  CHI   13   2   GB CHI  GB   Good      R Gould   25 80      Out  BAL    1   2  IND IND BAL   Good M Vanderjagt   20   Here is an excerpt of data for field goal attempts in the 2008 NFL season  a dataframe i ve called  fg   suppose that these 10 data points represent all of the field goals attempted in 2008  further suppose you want to know the the distance of the longest field goal attempted that year  who kicked it  and whether it was good or not  you also want to know the second-longest  as well as the third-longest  etc   and finally you want the shortest field goal attempt   Well  you could just do this   sort fg Dist  decreasing T    which returns  50 48 43 37 34 32 26 25 25 20  That is correct  but not very useful--it does tell us the distance of the longest field goal attempt  the second-longest    as well as the shortest  however  but that s all we know--eg  we don t know who the kicker was  whether the attempt was successful  etc  Of course  we need the entire dataframe sorted on the  Dist  column  put another way  we want to sort all of the data rows on the single attribute Dist  that would look like this   Stadium Home Week Qtr Away Off Def Result       Kicker Dist 751     Out  PHI   14   4  NYG PHI NYG   Good      D Akers   50 307     Out  DEN   14   2  BAL DEN BAL   Good       J Elam   48 571     Out   NE    1   2  OAK OAK  NE Missed S Janikowski   43 702     Out  OAK   15   4  CLE CLE OAK   Good     P Dawson   37 492     Out   KC   13   3  DEN  KC DEN   Good      L Tynes   34 491     Out   KC    9   1  OAK OAK  KC   Good S Janikowski   32 654     Out  NYG   11   2  PHI NYG PHI   Good      J Feely   26 691     Out  NYJ   17   3  BUF NYJ BUF   Good     M Nugent   25 164     Out  CHI   13   2   GB CHI  GB   Good      R Gould   25 80      Out  BAL    1   2  IND IND BAL   Good M Vanderjagt   20   This is what order does  It is  sort  for two-dimensional data  put another way  it returns a 1D integer index comprised of the row numbers such that sorting the rows according to that vector  would give you a correct row-oriented sort on the column  Dist  Here s how it works  Above  sort was used to sort the Dist column  to sort the entire dataframe on the Dist column  we use  order  exactly the same way as  sort  is used above   ndx   order fg Dist  decreasing T     i usually bind the array returned from  order  to the variable  ndx   which stands for  index   because i am going to use it as an index array to sort    that was step 1  here s step 2    ndx   what is returned by  sort  is then used as an index array to re-order the dataframe   fg    fg sorted   fg ndx     fg sorted is the re-ordered dataframe immediately above   In sum   sort  is used to create an index array  which specifies the sort order of the column you want sorted   which then is used as an index array to re-order the dataframe  or matrix

User · Answer

This could help you at some point   a  lt - c 45 50 10 96  a order a     What you get is   1  10 45 50 96   The code I wrote indicates you want  a  as a whole subset of  a  and you want it ordered from the lowest to highest value

User · Answer

I thought it might be helpful to lay out the ideas very simply here to summarize the good material posted by  doug   amp  linked by  duffymo   1 to each btw       order tells you which element of the original vector needs to be put first  second  etc   so as to sort the original vector  whereas  rank tell you which element has the lowest  second lowest  etc   value   For example      gt  a  lt - c 45  50  10  96   gt  order a     1  3 1 2 4    gt  rank a     1  2 3 1 4     So order a  is saying   put the third element first when you sort       whereas rank a  is saying   the first element is the second lowest         Note that they both agree on which element is lowest  etc   they just present the information differently    Thus we see that we can use order   to sort  but we can t use rank   that way      gt  a order a      1  10 45 50 96    gt  sort a     1  10 45 50 96    gt  a rank a      1  50 10 45 96     In general  order   will not equal rank   unless the vector has been sorted already      gt  b  lt - sort a     gt  order b   rank b     1  TRUE TRUE TRUE TRUE     Also  since order   is  essentially  operating over ranks of the data  you could compose them without affecting the information  but the other way around produces gibberish      gt  order rank a    order a     1  TRUE TRUE TRUE TRUE    gt  rank order a    rank a     1  FALSE FALSE FALSE  TRUE

User · Answer

In simple words  order   gives the locations of elements of increasing magnitude    For example  order c 10 20 30   will give 1 2 3 and  order c 30 20 10   will give 3 2 1

[r] Understanding the order() function

Examples related to r

Examples related to sorting

Examples related to r-faq