How to calculate 1st and 3rd quartiles

Question

I have DataFrame       time diff   avg trips 0   0 450000    1 0 1   0 483333    1 0 2   0 500000    1 0 3   0 516667    1 0 4   0 533333    2 0   I want to get 1st quartile  3rd quartile and median for the column time diff  To obtain median  I use np median df  time diff   values    How can I calculate quartiles

User · Answer

You can use np percentile to calculate quartiles  including the median     gt  gt  gt  np percentile df time diff  25     Q1 0 48333300000000001   gt  gt  gt  np percentile df time diff  50     median 0 5   gt  gt  gt  np percentile df time diff  75     Q3 0 51666699999999999   Or all at once    gt  gt  gt  np percentile df time diff   25  50  75   array   0 483333   0 5        0 516667

User · Answer

I also faced a similar problem when trying to find a package that finds quartiles  That s not to say the others are wrong but to say this is how I personally would have defined quartiles  It is similar to Shikar s results with using mid-point but also works on lists that have an odd length  If the quartile position is between lengths  it will use the average of the neighbouring values   i e  position always treated as either the exact position or 0 5 of the position  import math  def find quartile postions size       if size    1            All quartiles are the first  only  element         return 0  0  0     elif size    2            Lower quartile is first element  Upper quartile is second element  Median is average           Set to 0 5  0 5  0 5 if you prefer all quartiles to be the mean value         return 0  0 5  1     else            Lower quartile is element at 1 4th position  median at 1 2th  upper at 3 4           Quartiles can be between positions if size   1 is not divisible by 4         return  size   1    4 - 1   size   1    2 - 1  3    size   1    4 - 1  def find quartiles num array       size   len num array           if size    0          quartiles    0 0 0      else          sorted array   sorted num array          lower pos  median pos  upper pos   find quartile postions size             Floor so can work in arrays         floored lower pos   math floor lower pos          floored median pos   math floor median pos          floored upper pos   math floor upper pos             If position is an integer  the quartile is the elem at position           else the quartile is the mean of the elem  amp  the elem one position above         lower quartile    sorted array floored lower pos                            if  lower pos   1    0                            else  sorted array floored lower pos    sorted array floored lower pos   1     2                                     median    sorted array floored median pos                            if  median pos   1    0                            else  sorted array floored median pos    sorted array floored median pos   1     2                                     upper quartile    sorted array floored upper pos                            if  upper pos   1    0                            else  sorted array floored upper pos    sorted array floored upper pos   1     2                                     quartiles    lower quartile  median  upper quartile       return quartiles

User · Answer

If you want to use raw python rather than numpy or panda  you can use the python stats module to find the median of  the upper and lower half of the list        gt  gt  gt  import statistics as stat      gt  gt  gt  def quartile data               data sort                              half list   int len data   2              upper quartile   stat median data -half list              lower quartile   stat median data  half list               print  Lower Quartile    str lower quartile               print  Upper Quartile    str upper quartile               print  Interquartile Range    str upper quartile-lower quartile        gt  gt  gt  quartile df time diff    Line 1  import the statistics module under the alias  stat   Line 2  define the quartile function  Line 3  sort the data into ascending order  Line 4  get the length of half of the list  Line 5  get the median of the lower half of the list  Line 6  get the median of the upper half of the list  Line 7  print the lower quartile  Line 8  print the upper quartile  Line 9  print the interquartile range  Line 10  run the quartile function for the time diff column of the DataFrame

User · Answer

Using np percentile   q75  q25   np percentile DataFrame   75 25   iqr   q75 - q25   Answer from How do you find the IQR in Numpy

User · Answer

try that way  dfo   sorted df time diff   n len dfo   Q1 int  n 3  4    Q3 int  3 n 1  4      print  quot Q1 position   quot   Q1   quot Q1 position   quot   Q3   print  quot Q1 value   quot   dfo Q1    quot Q1 value   quot   dfo Q3

User · Answer

In my efforts to learn object-oriented programming alongside learning statistics  I made this  maybe you ll find it useful   samplesCourse    9  10  10  11  13  15  16  19  19  21  23  28  30  33  34  36  44  45  47  60   class sampleSet      def   init   self  sampleList           self sampleList   sampleList         self interList   list sampleList    interList is sampleList alias  alias used to maintain integrity of original sampleList      def find median self           self median   0          if len self sampleList    2    0                find median for even-numbered sample list length             self medL   self interList int len self interList  2 -1              self medU   self interList int len self interList  2               self median    self medL   self medU  2          else                find median for odd-numbered sample list length             self median   self interList int  len self interList -1  2           return self median      def find 1stQuartile self  median           self lower50List              self Q1   0            break out lower 50 percentile from sampleList         if len self interList    2    0              self lower50List   self interList  int len self interList  2           else                drop median to make list ready to divide into 50 percentiles             self interList pop interList index self median               self lower50List   self interList  int len self interList  2              find 1st quartile  median of lower 50 percentiles          if len self lower50List    2    0              self Q1L   self lower50List int len self lower50List  2 -1              self Q1U   self lower50List int len self lower50List  2               self Q1    self Q1L   self Q1U  2          else              self Q1   self lower50List int  len self lower50List -1  2            return self Q1      def find 3rdQuartile self  median           self upper50List              self Q3   0            break out upper 50 percentile from sampleList         if len self sampleList    2    0              self upper50List   self interList int len self interList  2            else              self interList pop interList index self median               self upper50List   self interList int len self interList  2               find 3rd quartile  median of upper 50 percentiles          if len self upper50List    2    0              self Q3L   self upper50List int len self upper50List  2 -1              self Q3U   self upper50List int len self upper50List  2               self Q3    self Q3L   self Q3U  2          else              self Q3   self upper50List int  len self upper50List -1  2            return self Q3      def find InterQuartileRange self  Q1  Q3           self IQR   self Q3 - self Q1         return self IQR      def find UpperFence self  Q3  IQR           self fence   self Q3   1 5   self IQR         return self fence  samples   sampleSet samplesCourse  median   samples find median   firstQ   samples find 1stQuartile median  thirdQ   samples find 3rdQuartile median  iqr   samples find InterQuartileRange firstQ  thirdQ  fence   samples find UpperFence thirdQ  iqr   print  Median is     median  print  1st quartile is     firstQ  print  3rd quartile is     thirdQ  print  IQR is     iqr  print  Upper fence is     fence

User · Answer

Coincidentally  this information is captured with the describe method   df time diff describe    count    5 000000 mean     0 496667 std      0 032059 min      0 450000 25       0 483333 50       0 500000 75       0 516667 max      0 533333 Name  time diff  dtype  float64

User · Answer

Building upon or rather correcting a bit on what Babak said     np percentile DOES VERY MUCH calculate the values of Q1  median  and Q3  Consider the sorted list below  s1  18 45 66 70 76 83 88 90 90 95 95 98   running np percentile s1   25  50  75   returns the actual values from the list   69   85 5  91 25   However  the quartiles are Q1 68 0  Median 85 5  Q3 92 5  which is the correct thing to say What we are missing here is the interpolation parameter of the np percentile and related functions  By default the value of this argument is linear  This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points i  lt  j  linear  i    j - i    fraction  where fraction is the fractional part of the index surrounded by i and j  lower  i  higher  j  nearest  i or j  whichever is nearest  midpoint   i   j    2  Thus running np percentile s1   25  50  75   interpolation  midpoint   returns the actual results for the list   68  85 5 92 5

User · Answer

you can use df describe    which would show the information

User · Answer

By using pandas   df time diff quantile  0 25 0 5 0 75     Out 793    0 25    0 483333 0 50    0 500000 0 75    0 516667 Name  time diff  dtype  float64

User · Answer

np percentile DOES NOT calculate the values of Q1  median  and Q3  Consider the sorted list below   samples    1  1  8  12  13  13  14  16  19  22  27  28  31    running np percentile samples   25  50  75   returns the actual values from the list   Out 1   array  12   14   22      However  the quartiles are Q1 10 0  Median 14  Q3 24 5  you can also use this link to find the quartiles and median online    One can use the below code to calculate the quartiles and median of a sorted list  because of sorting this approach requires O nlogn  computations where n is the number of items   Moreover  finding quartiles and median can be done in O n  computations using the Median of medians Selection algorithm  order statistics     samples   sorted  28  12  8  27  16  31  14  13  19  1  1  22  13    def find median sorted list       indices           list size   len sorted list      median   0      if list size   2    0          indices append int list size   2  - 1     -1 because index starts from 0         indices append int list size   2            median    sorted list indices 0     sorted list indices 1      2         pass     else          indices append int list size   2            median   sorted list indices 0           pass      return median  indices     pass  median  median indices   find median samples  Q1  Q1 indices   find median samples  median indices 0    Q2  Q2 indices   find median samples median indices -1    1     quartiles    Q1  median  Q2   print   Q1  median  Q3       format quartiles

[python] How to calculate 1st and 3rd quartiles?

Examples related to python

Examples related to python-2.7

Examples related to pandas

Examples related to numpy