How to calculate rolling moving average using NumPy SciPy

Question

There seems to be no function that simply calculates the moving average on numpy scipy  leading to convoluted solutions   My question is two-fold    What s the easiest way to  correctly  implement a moving average with numpy  Since this seems non-trivial and error prone  is there a good reason not to have the batteries included in this case

User · Answer

moving average

iterator method

reverse the array at i, and simply take the mean from i to n.

use list comprehension to generate mini arrays on the fly.

x = np.random.randint(10, size=20)

def moving_average(arr, n):
    return [ (arr[:i+1][::-1][:n]).mean() for i, ele in enumerate(arr) ]
d = 5

moving_average(x, d)

tensor convolution

moving_average = np.convolve(x, np.ones(d)/d, mode='valid')

User · Answer

talib contains a simple moving average tool  as well as other similar averaging tools  i e  exponential moving average   Below compares the method to some of the other solutions      timeit pd Series np arange 100000   rolling 3  mean   2 53 ms    40 5   s per loop  mean    std  dev  of 7 runs  100 loops each    timeit talib SMA real   np arange 100000    timeperiod   3  348   s    3 5   s per loop  mean    std  dev  of 7 runs  1000 loops each    timeit moving average np arange 100000   638   s    45 1   s per loop  mean    std  dev  of 7 runs  1000 loops each      One caveat is that the real must have elements of dtype   float  Otherwise the following error is raised     Exception  real is not double

User · Answer

If you just want a straightforward non-weighted moving average  you can easily implement it with np cumsum  which may be is faster than FFT based methods   EDIT Corrected an off-by-one wrong indexing spotted by Bean in the code  EDIT  def moving average a  n 3        ret   np cumsum a  dtype float      ret n     ret n   - ret  -n      return ret n - 1     n   gt  gt  gt  a   np arange 20   gt  gt  gt  moving average a  array    1     2     3     4     5     6     7     8     9    10    11           12    13    14    15    16    17    18     gt  gt  gt  moving average a  n 4  array    1 5    2 5    3 5    4 5    5 5    6 5    7 5    8 5    9 5          10 5   11 5   12 5   13 5   14 5   15 5   16 5   17 5     So I guess the answer is  it is really easy to implement  and maybe numpy is already a little bloated with specialized functionality

User · Answer

Starting in Numpy 1 20  the sliding window view provides a way to slide roll through windows of elements  Windows that you can then individually average  For instance  for a 4-element window  from numpy lib stride tricks import sliding window view    values   np array  5  3  8  10  2  1  5  1  0  2   np average sliding window view values  window shape   4   axis 1    array  6 5  5 75  5 25  4 5  2 25  1 75  2     Note the intermediary result of sliding window view    values   np array  5  3  8  10  2  1  5  1  0  2   sliding window view values  window shape   4    array    5   3   8  10              3   8  10   2              8  10   2   1             10   2   1   5              2   1   5   1              1   5   1   0              5   1   0   2

User · Answer

NumPy s lack of a particular domain-specific function is perhaps due to the Core Team s discipline and fidelity to NumPy s prime directive  provide an N-dimensional array type  as well as functions for creating  and indexing those arrays  Like many foundational objectives  this one is not small  and NumPy does it brilliantly   The  much  larger SciPy contains a much larger collection of domain-specific libraries  called subpackages by SciPy devs --for instance  numerical optimization  optimize   signal processsing  signal   and integral calculus  integrate    My guess is that the function you are after is in at least one of the SciPy subpackages  scipy signal perhaps   however  i would look first in the collection of SciPy scikits  identify the relevant scikit s  and look for the function of interest there   Scikits are independently developed packages based on NumPy SciPy and directed to a particular technical discipline  e g   scikits-image  scikits-learn  etc   Several of these were  in particular  the awesome OpenOpt for numerical optimization  were highly regarded  mature projects long before choosing to reside under the relatively new scikits rubric  The Scikits homepage liked to above lists about 30 such scikits  though at least several of those are no longer under active development    Following this advice would lead you to scikits-timeseries  however  that package is no longer under active development  In effect  Pandas has become  AFAIK  the de facto NumPy-based time series library   Pandas has several functions that can be used to calculate a moving average  the simplest of these is probably rolling mean  which you use like so    gt  gt  gt    the recommended syntax to import pandas  gt  gt  gt  import pandas as PD  gt  gt  gt  import numpy as NP   gt  gt  gt    prepare some fake data   gt  gt  gt    the date-time indices   gt  gt  gt  t   PD date range  1 1 2010    12 31 2012   freq  D     gt  gt  gt    the data   gt  gt  gt  x   NP arange 0  t shape 0     gt  gt  gt    combine the data  amp  index into a Pandas  Series  object  gt  gt  gt  D   PD Series x  t    Now  just call the function rolling mean passing in the Series object and a window size  which in my example below is 10 days    gt  gt  gt  d mva   PD rolling mean D  10    gt  gt  gt    d mva is the same size as the original Series  gt  gt  gt  d mva shape      1096     gt  gt  gt    though obviously the first w values are NaN where w is the window size  gt  gt  gt  d mva  3      2010-01-01         NaN     2010-01-02         NaN     2010-01-03         NaN   verify that it worked--e g   compared values 10 - 15 in the original series versus the new Series smoothed with rolling mean   gt  gt  gt  D 10 15       2010-01-11    2 041076      2010-01-12    2 041076      2010-01-13    2 720585      2010-01-14    2 720585      2010-01-15    3 656987      Freq  D   gt  gt  gt  d mva 10 20        2010-01-11    3 131125       2010-01-12    3 035232       2010-01-13    2 923144       2010-01-14    2 811055       2010-01-15    2 785824       Freq  D   The function rolling mean  along with about a dozen or so other function are informally grouped in the Pandas documentation under the rubric moving window functions  a second  related group of functions in Pandas is referred to as exponentially-weighted functions  e g   ewma  which calculates exponentially moving weighted average   The fact that this second group is not included in the first  moving window functions  is perhaps because the exponentially-weighted transforms don t rely on a fixed-length window

User · Answer

This answer using Pandas is adapted from above  as rolling mean is not part of Pandas anymore    the recommended syntax to import pandas import pandas as pd import numpy as np    prepare some fake data    the date-time indices  t   pd date range  1 1 2010    12 31 2012   freq  D      the data  x   np arange 0  t shape 0      combine the data  amp  index into a Pandas  Series  object D   pd Series x  t    Now  just call the function rolling on the dataframe with a window size  which in my example below is 10 days   d mva10   D rolling 10  mean      d mva is the same size as the original Series   though obviously the first w values are NaN where w is the window size d mva10  11   2010-01-01    NaN 2010-01-02    NaN 2010-01-03    NaN 2010-01-04    NaN 2010-01-05    NaN 2010-01-06    NaN 2010-01-07    NaN 2010-01-08    NaN 2010-01-09    NaN 2010-01-10    4 5 2010-01-11    5 5 Freq  D  dtype  float64

User · Answer

Here are a variety of ways to do this  along with some benchmarks  The best methods are versions using optimized code from other libraries  The bottleneck move mean method is probably best all around  The scipy convolve approach is also very fast  extensible  and syntactically and conceptually simple  but doesn t scale well for very large window values  The numpy cumsum method is good if you need a pure numpy approach   Note  Some of these  e g  bottleneck move mean  are not centered  and will shift your data   import numpy as np import scipy as sci import scipy signal as sig import pandas as pd import bottleneck as bn import time as time  def rollavg direct a n         Direct  for  loop      assert n 2  1     b   a 0 0     for i in range len a             b i  a max i-n  2 0  min i n  2 1 len a    mean       return b  def rollavg comprehension a n        List comprehension      assert n 2  1     r N   int n 2  len a      return np array  a max i-r 0  min i r 1 N   mean   for i in range N      def rollavg convolve a n        scipy convolve      assert n 2  1     return sci convolve a np ones n dtype  float   n   same   n  2 -n  2 1     def rollavg convolve edges a n        scipy convolve  edge handling      assert n 2  1     return sci convolve a np ones n dtype  float     same   sci convolve np ones len a   np ones n    same      def rollavg cumsum a n        numpy cumsum      assert n 2  1     cumsum vec   np cumsum np insert a  0  0        return  cumsum vec n   - cumsum vec  -n     n  def rollavg cumsum edges a n        numpy cumsum  edge handling      assert n 2  1     N   len a      cumsum vec   np cumsum np insert np pad a  n-1 n-1   constant    0  0        d   np hstack  np arange n  2 1 n  np ones N-n  n np arange n n  2 -1          return  cumsum vec n n  2 -n  2 1  - cumsum vec n  2 -n-n  2     d  def rollavg roll a n        Numpy array rolling      assert n 2  1     N   len a      rolling idx   np mod  N-1  np arange n    None    np arange N   N      return a rolling idx  mean axis 0  n-1     def rollavg roll edges a n         see https   stackoverflow com questions 42101082 fast-numpy-roll      Numpy array rolling  edge handling      assert n 2  1     a   np pad a  0 n-1-n  2    constant   np ones n    None      m   a shape 1      idx   np mod  m-1  np arange n    None    np arange m   m    Rolling index     out   a np arange -n  2 n  2    None   idx      d   np hstack  np arange 1 n  np ones m-2 n 1 n  2  n np arange n n  2 -1        return  out sum axis 0  d  n  2    def rollavg pandas a n        Pandas rolling average      return pd DataFrame a  rolling n  center True  min periods 1  mean   to numpy    def rollavg bottlneck a n        bottleneck move mean      return bn move mean a  window n  min count 1   N   10  6 a   np random rand N  functions    rollavg direct  rollavg comprehension  rollavg convolve           rollavg convolve edges  rollavg cumsum  rollavg cumsum edges           rollavg pandas  rollavg bottlneck  rollavg roll  rollavg roll edges   print  Small window  n 3     load ext memory profiler for f in functions        print   n  f   doc                timeit b f a 3   print   nLarge window  n 1001    for f in functions 0 -2         print   n  f   doc                timeit b f a 1001   print   nMemory n   print  Small window  n 3    N   10  7 a   np random rand N   load ext memory profiler for f in functions 2          print   n  f   doc                memit b f a 3   print   nLarge window  n 1001    for f in functions 2 -2         print   n  f   doc                memit b f a 1001    Timing  Small window  n 3   Direct  for  loop     4 14 s    23 7 ms per loop  mean    std  dev  of 7 runs  1 loop each   List comprehension    3 96 s    27 9 ms per loop  mean    std  dev  of 7 runs  1 loop each   scipy convolve    1 07 ms    26 7   s per loop  mean    std  dev  of 7 runs  1000 loops each   scipy convolve  edge handling    4 68 ms    9 69   s per loop  mean    std  dev  of 7 runs  100 loops each   numpy cumsum    5 31 ms    5 11   s per loop  mean    std  dev  of 7 runs  100 loops each   numpy cumsum  edge handling    8 52 ms    11 1   s per loop  mean    std  dev  of 7 runs  100 loops each   Pandas rolling average    9 85 ms    9 63   s per loop  mean    std  dev  of 7 runs  100 loops each   bottleneck move mean    1 3 ms    12 2   s per loop  mean    std  dev  of 7 runs  100 loops each   Numpy array rolling    31 3 ms    91 9   s per loop  mean    std  dev  of 7 runs  10 loops each   Numpy array rolling  edge handling    61 1 ms    55 9   s per loop  mean    std  dev  of 7 runs  10 loops each    Timing  Large window  n 1001   Direct  for  loop    4 67 s    34 ms per loop  mean    std  dev  of 7 runs  1 loop each   List comprehension    4 46 s    14 6 ms per loop  mean    std  dev  of 7 runs  1 loop each   scipy convolve    103 ms    165   s per loop  mean    std  dev  of 7 runs  10 loops each   scipy convolve  edge handling    272 ms    1 23 ms per loop  mean    std  dev  of 7 runs  1 loop each   numpy cumsum    5 19 ms    12 4   s per loop  mean    std  dev  of 7 runs  100 loops each   numpy cumsum  edge handling    8 7 ms    11 5   s per loop  mean    std  dev  of 7 runs  100 loops each   Pandas rolling average    9 67 ms    199   s per loop  mean    std  dev  of 7 runs  100 loops each   bottleneck move mean    1 31 ms    15 7   s per loop  mean    std  dev  of 7 runs  100 loops each    Memory  Small window  n 3   The memory profiler extension is already loaded  To reload it  use     reload ext memory profiler  scipy convolve    peak memory  362 66 MiB  increment  73 61 MiB  scipy convolve  edge handling    peak memory  510 24 MiB  increment  221 19 MiB  numpy cumsum    peak memory  441 81 MiB  increment  152 76 MiB  numpy cumsum  edge handling    peak memory  518 14 MiB  increment  228 84 MiB  Pandas rolling average    peak memory  449 34 MiB  increment  160 02 MiB  bottleneck move mean    peak memory  374 17 MiB  increment  75 54 MiB  Numpy array rolling    peak memory  661 29 MiB  increment  362 65 MiB  Numpy array rolling  edge handling    peak memory  1111 25 MiB  increment  812 61 MiB   Memory  Large window  n 1001   scipy convolve    peak memory  370 62 MiB  increment  71 83 MiB  scipy convolve  edge handling    peak memory  521 98 MiB  increment  223 18 MiB  numpy cumsum    peak memory  451 32 MiB  increment  152 52 MiB  numpy cumsum  edge handling    peak memory  527 51 MiB  increment  228 71 MiB  Pandas rolling average    peak memory  451 25 MiB  increment  152 50 MiB  bottleneck move mean    peak memory  374 64 MiB  increment  75 85 MiB

User · Answer

Here is a fast implementation using numba  mind the types   Note it does contain nans where shifted   import numpy as np import numba as nb   nb jit nb float64    nb float64    nb int64           fastmath True nopython True  def moving average  array  window            ret   np cumsum array      ret window     ret window   - ret  -window      ma   ret window - 1     window     n   np empty window-1   n fill np nan      return np concatenate  n ravel    ma ravel

User · Answer

for i in range len Data        Data i  1    Data i-lookback i  0  sum     lookback   Try this piece of code  I think it s simpler and does the job  lookback is the window of the moving average   In the Data i-lookback i  0  sum   I have put 0 to refer to the first column of the dataset but you can put any column you like in case you have more than one column

User · Answer

In case you want to take care the edge conditions carefully  compute mean only from available elements at edges   the following function will do the trick    import numpy as np  def running mean x  N       out   np zeros like x  dtype np float64      dim len   x shape 0      for i in range dim len           if N 2    0              a  b   i -  N-1   2  i    N-1   2   2         else              a  b   i -  N-1   2  i    N-1   2   1           cap indices to min and max indices         a   max 0  a          b   min dim len  b          out i    np mean x a b       return out   gt  gt  gt  running mean np array  1 2 3 4    2  array  1 5  2 5  3 5  4       gt  gt  gt  running mean np array  1 2 3 4    3  array  1 5  2    3    3 5

User · Answer

A simple way to achieve this is by using np convolve  The idea behind this is to leverage the way the discrete convolution is computed and use it to return a rolling mean  This can be done by convolving with a sequence of np ones of a length equal to the sliding window length we want   In order to do so we could define the following function   def moving average x  w       return np convolve x  np ones w    valid     w   This function will be taking the convolution of the sequence x and a sequence of ones of length w  Note that the chosen mode is valid so that the convolution product is only given for points where the sequences overlap completely     Some examples   x   np array  5 3 8 10 2 1 5 1 0 2     For a moving average with a window of length 2 we would have   moving average x  2    array  4    5 5  9    6    1 5  3    3    0 5  1       And for a window of length 4   moving average x  4    array  6 5   5 75  5 25  4 5   2 25  1 75  2          How does convolve work   Lets have a more in depth look at the way the discrete convolution is being computed  The following function aims to replicate the way np convolve is computing the output values   def mov avg x  w       for m in range len x - w-1            yield sum np ones w    x m m w     w    Which  for the same example above would also yield   list mov avg x  2      4 0  5 5  9 0  6 0  1 5  3 0  3 0  0 5  1 0    So what is being done at each step is to take the inner product between the array of ones and the current  window  In this case the multiplication by np ones w  is superfluous given that we are directly taking the sum of the sequence   Bellow is an example of how the first outputs are computed so that it is a little clearer  Lets suppose we want a window of w 4    1 1 1 1   5 3 8 10 2 1 5 1 0 2     1 5   1 3   1 8   1 10    w   6 5   And the following output would be computed as      1 1 1 1   5 3 8 10 2 1 5 1 0 2     1 3   1 8   1 10   1 2    w   5 75   And so on  returning a moving average of the sequence once all overlaps have been performed

User · Answer

By comparing the solution below with the one that uses cumsum of numpy  This one takes almost half the time  This is because it does not need to go through the entire array to do the cumsum and then do all the subtraction  Moreover  the cumsum can be  dangerous  if the array is huge and the number are huge  possible overflow   Of course  also here the danger exists but at least are summed together only the essential numbers   def moving average array numbers  n       if n  gt  len array numbers         return        temp sum   sum array numbers  n       averages    temp sum   float n       for first index  item in enumerate array numbers n             temp sum    item - array numbers first index          averages append temp sum   float n       return averages

User · Answer

I actually wanted a slightly different behavior than the accepted answer  I was building a moving average feature extractor for an sklearn pipeline  so I required that the output of the moving average have the same dimension as the input  What I want is for the moving average to assume the series stays constant  ie a moving average of  1 2 3 4 5  with window 2 would give  1 5 2 5 3 5 4 5 5 0     For column vectors  my use case  we get  def moving average col X  n     z2   np cumsum np pad X    n 0   0 0     constant   constant values 0   axis 0    z1   np cumsum np pad X    0 n   0 0     constant   constant values X -1    axis 0    return  z1-z2   n-1  -1  n   And for arrays  def moving average array X  n     z2   np cumsum np pad X   n 0    constant   constant values 0     z1   np cumsum np pad X   0 n    constant   constant values X -1      return  z1-z2   n-1  -1  n   Of course  one doesn t have to assume constant values for the padding  but doing so should be adequate in most cases

User · Answer

I use either the accepted answer s solution  slightly modified to have same length for output as input  or pandas  version as mentioned in a comment of another answer  I summarize both here with a reproducible example for future reference   import numpy as np import pandas as pd  def moving average a  n       ret   np cumsum a  dtype float      ret n     ret n   - ret  -n      return ret   n  def moving average centered a  n       return pd Series a  rolling window n  center True  mean   to numpy    A    0  0  1  2  4  5  4  print moving average A  3          0          0          0 33333333 1          2 33333333 3 66666667 4 33333333  print moving average centered A  3      nan        0 33333333 1          2 33333333 3 66666667 4 33333333 nan

User · Answer

I feel this can be easily solved using bottleneck  See basic sample below   import numpy as np import bottleneck as bn  a   np random randint 4  1000  size  5  7   mm   bn move mean a  window 2  min count 1    This gives move mean along each axis     mm  is the moving mean for  a     window  is the max number of entries to consider for moving mean    min count  is min number of entries to consider for moving mean  e g  for first element or if the array has nan values     The good part is Bottleneck helps to deal with nan values and it s also very efficient

[python] How to calculate rolling / moving average using NumPy / SciPy?

Examples related to python

Examples related to numpy

Examples related to time-series

Examples related to moving-average

Examples related to rolling-computation