Calculating percentile of dataset column

Question

A quick one for you  dearest R gurus   I m doing an assignment and I ve been asked  in this exercise  to get basic statistics out of the infert dataset  it s in-built   and specifically one of its columns  infert age   For anyone not familiar with the dataset    gt  table ages       Which is just subset infert  select c  age         age 1    26 2    42 3    39 4    34 5    35 6    36 7    23 8    32 9    21 10   28 11   29     246  35 247  29 248  23   I ve had to find median values of the column  variance  skewness  standard deviation which were all okay  until I was asked to find the column  percentiles    I haven t been able to find anything so far  and maybe I ve translated it incorrectly from greek  the language of the assignment  It was  p s st      a   Google Translate pointed the English term to be  percentiles    Any tutorials or ideas on finding those  percentiles  of infert age

User · Answer

The quantile   function will do much of what you probably want  but since the question was ambiguous  I will provide an alternate answer that does something slightly different from quantile     ecdf infert age  infert age    will generate a vector of the same length as infert age giving the proportion of infert age that is below each observation  You can read the ecdf documentation  but the basic idea is that ecdf   will give you a function that returns the empirical cumulative distribution  Thus ecdf X  Y  is the value of the cumulative distribution of X at the points in Y  If you wanted to know just the probability of being below 30  thus what percentile 30 is in the sample   you could say  ecdf infert age  30    The main difference between this approach and using the quantile   function is that quantile   requires that you put in the probabilities to get out the levels  and this requires that you put in the levels to get out the probabilities

User · Answer

If you order a vector x  and find the values that is half way through the vector  you just found a median  or 50th percentile  Same logic applies for any percentage  Here are two examples   x  lt - rnorm 100  quantile x  probs   c 0  0 25  0 5  0 75  1     quartile quantile x  probs   seq 0  1  by  0 1     decile

User · Answer

table ages  lt - subset infert  select c  age    summary table ages               age           Min     21 00      1st Qu  28 00      Median  31 00      Mean    31 50      3rd Qu  35 25      Max     44 00     This is probably what they re looking for  summary      applied to a numeric returns the min  max  mean  median  and 25th and 75th percentile of the data   Note that   summary infert age       Min  1st Qu   Median    Mean 3rd Qu     Max       21 00   28 00   31 00   31 50   35 25   44 00    The numbers are the same but the format is different  This is because table ages is a data frame with one column  ages   whereas infert age is a numeric vector  Try typing summary infert

User · Answer

You can also use the hmisc package that will give you the following percentiles   0 05  0 1  0 25  0 5  0 75  0 9   0 95  Just use the describe table ages

User · Answer

Using  dplyr      library dplyr     percentiles infert   gt      mutate PCT   ntile age  100      quartiles infert   gt      mutate PCT   ntile age  4      deciles infert   gt      mutate PCT   ntile age  10

[r] Calculating percentile of dataset column

Examples related to r

Examples related to statistics

Examples related to percentile