How do I calculate percentiles with python numpy

Question

Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array   I am looking for something similar to Excel s percentile function   I looked in NumPy s statistics reference  and couldn t find this  All I could find is the median  50th percentile   but not something more specific

User · Answer

The definition of percentile I usually see expects as a result the value from the supplied list below which P percent of values are found... which means the result must be from the set, not an interpolation between set elements. To get that, you can use a simpler function.

def percentile(N, P):
    """
    Find the percentile of a list of values

    @parameter N - A list of values.  N must be sorted.
    @parameter P - A float value from 0.0 to 1.0

    @return - The percentile of the values.
    """
    n = int(round(P * len(N) + 0.5))
    return N[n-1]

# A = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# B = (15, 20, 35, 40, 50)
#
# print percentile(A, P=0.3)
# 4
# print percentile(A, P=0.8)
# 9
# print percentile(B, P=0.3)
# 20
# print percentile(B, P=0.8)
# 50

If you would rather get the value from the supplied list at or below which P percent of values are found, then use this simple modification:

def percentile(N, P):
    n = int(round(P * len(N) + 0.5))
    if n > 1:
        return N[n-2]
    else:
        return N[0]

Or with the simplification suggested by @ijustlovemath:

def percentile(N, P):
    n = max(int(round(P * len(N) + 0.5)), 2)
    return N[n-2]

User · Answer

You might be interested in the SciPy Stats package  It has the percentile function you re after and many other statistical goodies   percentile   is available in numpy too   import numpy as np a   np array  1 2 3 4 5   p   np percentile a  50    return 50th percentile  e g median  print p 3 0   This ticket leads me to believe they won t be integrating percentile   into numpy anytime soon

User · Answer

check for scipy stats module     scipy stats scoreatpercentile

User · Answer

To calculate the percentile of a series  run   from scipy stats import rankdata import numpy as np  def calc percentile a  method  min        if isinstance a  list           a   np asarray a      return rankdata a  method method    float len a     For example   a   range 20  print  val  round percentile  3  for val  percentile in zip a  calc percentile a     gt  gt  gt   0  0 05  1  0 1  2  0 15  3  0 2  4  0 25  5  0 3  6  0 35  7  0 4  8  0 45  9  0 5  10  0 55  11  0 6  12  0 65  13  0 7  14  0 75  15  0 8  16  0 85  17  0 9  18  0 95  19  1 0

User · Answer

for a series  used describe functions  suppose you have df with following columns sales and id  you want to calculate percentiles for sales then it works like this   df  sales   describe percentiles    0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1    0 0   0  minimum 1  maximum  0 1   10th percentile and so on

User · Answer

Here s how to do it without numpy  using only python to calculate the percentile   import math  def percentile data  percentile       size   len data      return sorted data  int math ceil  size   percentile    100   - 1   p5   percentile mylist  5  p25   percentile mylist  25  p50   percentile mylist  50  p75   percentile mylist  75  p95   percentile mylist  95

User · Answer

import numpy as np a    154  400  1124  82  94  108  print np percentile a 95    gives the 95th percentile

User · Answer

In case you need the answer to be a member of the input numpy array   Just to add that the percentile function in numpy by default calculates the output as a linear weighted average of the two neighboring entries in the input vector   In some cases people may want the returned percentile to be an actual element of the vector  in this case  from v1 9 0 onwards you can use the  interpolation  option  with either  lower    higher  or  nearest    import numpy as np x np random uniform 10 size  1000  -5 0  np percentile x 70    70th percentile  2 075966046220879  np percentile x 70 interpolation  nearest    2 0729677997904314   The latter is an actual entry in the vector  while the former is a linear interpolation of two vector entries that border the percentile

User · Answer

A convenient way to calculate percentiles for a one-dimensional numpy sequence or matrix is by using numpy percentile  lt https   docs scipy org doc numpy reference generated numpy percentile html   Example   import numpy as np  a   np array  0 1 2 3 4 5 6 7 8 9 10   p50   np percentile a  50    return 50th percentile  e g median  p90   np percentile a  90    return 90th percentile  print  median     p50   and p90     p90    median    5 0  and p90    9 0   However  if there is any NaN value in your data  the above function will not be useful  The recommended function to use in that case is the numpy nanpercentile  lt https   docs scipy org doc numpy reference generated numpy nanpercentile html  function   import numpy as np  a NaN   np array  0  1  2  3  4  5  6  7  8  9  10    a NaN 0    np nan print  a NaN  a NaN  p50   np nanpercentile a NaN  50    return 50th percentile  e g median  p90   np nanpercentile a NaN  90    return 90th percentile  print  median     p50   and p90     p90    median    5 5  and p90    9 1   In the two options presented above  you can still choose the interpolation mode  Follow the examples below for easier understanding   import numpy as np  b   np array  1 2 3 4 5 6 7 8 9 10   print  percentiles using default interpolation   p10   np percentile b  10    return 10th percentile  p50   np percentile b  50    return 50th percentile  e g median  p90   np percentile b  90    return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    1 9   median    5 5  and p90    9 1  print  percentiles using interpolation       linear   p10   np percentile b  10 interpolation  linear     return 10th percentile  p50   np percentile b  50 interpolation  linear     return 50th percentile  e g median  p90   np percentile b  90 interpolation  linear     return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    1 9   median    5 5  and p90    9 1  print  percentiles using interpolation       lower   p10   np percentile b  10 interpolation  lower     return 10th percentile  p50   np percentile b  50 interpolation  lower     return 50th percentile  e g median  p90   np percentile b  90 interpolation  lower     return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    1   median    5  and p90    9  print  percentiles using interpolation       higher   p10   np percentile b  10 interpolation  higher     return 10th percentile  p50   np percentile b  50 interpolation  higher     return 50th percentile  e g median  p90   np percentile b  90 interpolation  higher     return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    2   median    6  and p90    10  print  percentiles using interpolation       midpoint   p10   np percentile b  10 interpolation  midpoint     return 10th percentile  p50   np percentile b  50 interpolation  midpoint     return 50th percentile  e g median  p90   np percentile b  90 interpolation  midpoint     return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    1 5   median    5 5  and p90    9 5  print  percentiles using interpolation       nearest   p10   np percentile b  10 interpolation  nearest     return 10th percentile  p50   np percentile b  50 interpolation  nearest     return 50th percentile  e g median  p90   np percentile b  90 interpolation  nearest     return 90th percentile  print  p10     p10    median     p50   and p90     p90   p10    2   median    5  and p90    9   If your input array only consists of integer values  you might be interested in the percentil answer as an integer  If so  choose interpolation mode such as    lower        higher     or    nearest

User · Answer

By the way  there is a pure-Python implementation of percentile function  in case one doesn t want to depend on scipy   The function is copied below          http   code activestate com recipes 511478   r1  import math import functools  def percentile N  percent  key lambda x x               Find the percentile of a list of values        parameter N - is a list of values  Note N MUST BE already sorted       parameter percent - a float value from 0 0 to 1 0       parameter key - optional key function to compute value from each element of N        return - the percentile of the values             if not N          return None     k    len N -1    percent     f   math floor k      c   math ceil k      if f    c          return key N int k        d0   key N int f       c-k      d1   key N int c       k-f      return d0 d1    median is 50th percentile  median   functools partial percentile  percent 0 5     end of http   code activestate com recipes 511478

User · Answer

Starting Python 3 8  the standard library comes with the quantiles function  as part of the statistics module   from statistics import quantiles  quantiles  1  2  3  4  5   n 100     0 06  0 12  0 18  0 24  0 3  0 36  0 42  0 48  0 54  0 6  0 66  0 72  0 78  0 84  0 9  0 96  1 02  1 08  1 14  1 2  1 26  1 32  1 38  1 44  1 5  1 56  1 62  1 68  1 74  1 8  1 86  1 92  1 98  2 04  2 1  2 16  2 22  2 28  2 34  2 4  2 46  2 52  2 58  2 64  2 7  2 76  2 82  2 88  2 94  3 0  3 06  3 12  3 18  3 24  3 3  3 36  3 42  3 48  3 54  3 6  3 66  3 72  3 78  3 84  3 9  3 96  4 02  4 08  4 14  4 2  4 26  4 32  4 38  4 44  4 5  4 56  4 62  4 68  4 74  4 8  4 86  4 92  4 98  5 04  5 1  5 16  5 22  5 28  5 34  5 4  5 46  5 52  5 58  5 64  5 7  5 76  5 82  5 88  5 94  quantiles  1  2  3  4  5   n 100  49    50th percentile  e g median    3 0     quantiles returns for a given distribution dist a list of n - 1 cut points separating the n quantile intervals  division of dist into n continuous intervals with equal probability       statistics quantiles dist     n 4  method  exclusive     where n  in our case  percentiles  is 100

[python] How do I calculate percentiles with python/numpy?

Examples related to python

Examples related to numpy

Examples related to statistics

Examples related to numpy-ndarray

Examples related to percentile