Calculating moving average

Question

I m trying to use R to calculate the moving average over a series of values in a matrix  The normal R mailing list search hasn t been very helpful though  There doesn t seem to be a built-in function in R will allow me to calculate moving averages  Do any packages provide one  Or do I need to write my own

User · Answer

EDIT  took great joy in adding the side parameter  for a moving average  or sum  or      of e g  the past 7 days of a Date vector   For people just wanting to calculate this themselves  it s nothing more than    x   vector with numeric data   w   window length y  lt - numeric length   length x    for  i in seq len length x        ind  lt - c  i - floor w   2    i   floor w   2      ind  lt - ind ind  in  seq len length x      y i   lt - mean x ind      y  But it gets fun to make it independent of mean    so you can calculate any  moving  function    our working horse  moving fn  lt - function x  w  fun             x   vector with numeric data     w   window length     fun   function to apply     side   side to take   c entre   l eft or  r ight           parameters passed on to  fun    y  lt - numeric length x     for  i in seq len length x          if  side  in  c  quot c quot    quot centre quot    quot center quot            ind  lt - c  i - floor w   2    i   floor w   2          else if  side  in  c  quot l quot    quot left quot            ind  lt - c  i - floor w    1  i        else if  side  in  c  quot r quot    quot right quot            ind  lt - c i  i   floor w  - 1         else         stop  quot  side  must be one of  centre    left    right  quot   call    FALSE            ind  lt - ind ind  in  seq len length x        y i   lt - fun x ind              y      and now any variation you can think of  moving average  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   mean  side   side  na rm   na rm     moving sum  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   sum  side   side  na rm   na rm     moving maximum  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   max  side   side  na rm   na rm     moving median  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   median  side   side  na rm   na rm     moving Q1  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   quantile  side   side  na rm   na rm  0 25     moving Q3  lt - function x  w   5  side    quot centre quot   na rm   FALSE      moving fn x   x  w   w  fun   quantile  side   side  na rm   na rm  0 75

User · Answer

In order to complement the answer of  cantdutchthis and Rodrigo Remedio   moving fun  lt - function x  w  FUN             x  a double vector     w  the length of the window  i e   the section of the vector selected to apply FUN     FUN  a function that takes a vector and return a summarize value  e g   mean  sum  etc      Given a double type vector apply a FUN over a moving window from left to the right          when a window boundary is not a legal section  i e  lower bound and i  upper bound          are not contained in the length of the vector  return a NA real    if  w  lt  1        stop  The length of the window  w  must be greater than 0         output  lt - x   for  i in 1 length x            plus 1 because the index is inclusive with the upper bound  i      lower bound  lt - i - w   1     if  lower bound  lt  1          output i   lt - NA real        else         output i   lt - FUN x lower bound i                    output      example v  lt - seq 1 10     compute a MA 2  moving fun v  2  mean     compute moving sum of two periods moving fun v  2  sum

User · Answer

You could use RcppRoll for very quick moving averages written in C    Just call the roll mean function  Docs can be found here     Otherwise  this  slower  for loop should do the trick   ma  lt - function arr  n 15     res   arr   for i in n length arr        res i    mean arr  i-n  i         res

User · Answer

Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.

reqd_ma <- rollapply(x, FUN = mean, width = n)

where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.

User · Answer

The slider package can be used for this  It has an interface that has been specifically designed to feel similar to purrr  It accepts any arbitrary function  and can return any type of output  Data frames are even iterated over row wise  The pkgdown site is here     library slider   x  lt - 1 3    Mean of the current value   1 value before it   returned as a double vector slide dbl x   mean  x  na rm   TRUE    before   1    gt   1  1 0 1 5 2 5   df  lt - data frame x   x  y   x     Slide row wise over data frames slide df    x   before   1    gt    1     gt    x y   gt  1 1 1   gt     gt    2     gt    x y   gt  1 1 1   gt  2 2 2   gt     gt    3     gt    x y   gt  1 2 2   gt  2 3 3   The overhead of both slider and data table s frollapply   should be pretty low  much faster than zoo   frollapply   looks to be a little faster for this simple example here  but note that it only takes numeric input  and the output must be a scalar numeric value  slider functions are completely generic  and you can return any data type     library slider  library zoo  library data table   x  lt - 1 50000   0L  bench  mark    slider   slide int x  function x  1L   before   5   complete   TRUE     zoo   rollapplyr x  FUN   function x  1L  width   6  fill   NA     datatable   frollapply x  n   6  FUN   function x  1L     iterations   200     gt    A tibble  3 x 6   gt    expression      min   median  itr sec  mem alloc  gc sec    gt     lt bch expr gt   lt bch tm gt   lt bch tm gt       lt dbl gt   lt bch byt gt      lt dbl gt    gt  1 slider      19 82ms   26 4ms     38 4    829 8KB     19 0   gt  2 zoo        177 92ms  211 1ms      4 71    17 9MB     24 8   gt  3 datatable    7 78ms   10 9ms     87 9    807 1KB     38 7

User · Answer

vector avg  lt - function x     sum x   0   for i in 1 length x        if  is na x i          sum x   sum x   x i        return sum x length x

User · Answer

In fact RcppRoll is very good   The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed   ma  lt - function arr  n 15     res   arr   for i in n length arr        res i    mean arr  i-n 1  i         res     Another way  which handles missings  is given here     A third way  improving cantdutchthis code to calculate partial averages or not  follows     ma  lt - function x  n 2 parcial TRUE     res   x  set the first values    if  parcial  TRUE       for i in 1 length x          t lt -max i-n 1 1        res i    mean x t i             res     else      for i in 1 length x          t lt -max i-n 1 1        res i    mean x t i             res -c seq 1 n-1 1     remove the n-1 first i e   res c -3 -4

User · Answer

Rolling Means Maximums Medians in the zoo package  rollmean  MovingAverages in TTR ma in forecast

User · Answer

In data table 1 12 0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA  NaN and  Inf  -Inf values   As there is no reproducible example in the question there is not much more to address here     You can find more info about  frollmean in manual  also available online at  frollmean     Examples from manual below   library data table  d   as data table list 1 6 2  3 8 4      rollmean of single vector and single window frollmean d   V1   3     multiple columns at once frollmean d  3     multiple windows at once frollmean d     V1    c 3  4      multiple columns and multiple windows at once frollmean d  c 3  4       three above are embarrassingly parallel using openmp

User · Answer

Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding  and computing a weighted average  supported by filter  using custom weights  wma  lt - function x       wts  lt - c seq 0 5  4  0 5   seq 3 5  0 5  -0 5     nside  lt -  length wts -1  2     pad x with begin and end values for filter to avoid NAs   xp  lt - c rep first x   nside   x  rep last x   nside      z  lt - stats  filter xp  wts sum wts   sides   2    gt   as vector    z  nside 1   nside length x

User · Answer

You may calculate the moving average of a vector x with a window width of k by  apply embed x  k   1  mean

User · Answer

Or you can simply calculate it using filter, here's the function I use:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

If you use dplyr, be careful to specify stats::filter in the function above.

User · Answer

I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:

v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)

Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.

In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

User · Answer

Using cumsum should be sufficient and efficient  Assuming you have a vector x and you want a running sum of n numbers  cx  lt - c 0 cumsum x   rsum  lt -  cx  n 1  length cx   - cx 1  length cx  - n      n   As pointed out in the comments by  mzuther  this assumes that there are no NAs in the data  to deal with those would require dividing each window by the number of non-NA values  Here s one way of doing that  incorporating the comment from  Ricardo Cruz   cx  lt - c 0  cumsum ifelse is na x   0  x    cn  lt - c 0  cumsum ifelse is na x   0  1    rx  lt - cx  n 1  length cx   - cx 1  length cx  - n   rn  lt - cn  n 1  length cx   - cn 1  length cx  - n   rsum  lt - rx   rn   This still has the issue that if all the values in the window are NAs then there will be a division by zero error

User · Answer

Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package   library tidyverse  library zoo   some data   tibble day   1 10    cma   centered moving average   tma   trailing moving average some data   some data   gt       mutate cma   rollmean day  k   3  fill   NA     gt       mutate tma   rollmean day  k   3  fill   NA  align    right    some data   gt    A tibble  10 x 3   gt       day   cma   tma   gt      lt int gt   lt dbl gt   lt dbl gt    gt   1     1    NA    NA   gt   2     2     2    NA   gt   3     3     3     2   gt   4     4     4     3   gt   5     5     5     4   gt   6     6     6     5   gt   7     7     7     6   gt   8     8     8     7   gt   9     9     9     8   gt  10    10    NA     9

User · Answer

One can use runner package for moving functions  In this case mean run function  Problem with cummean is that it doesn t handle NA values  but mean run does  runner package also supports irregular time series and windows can depend on date   library runner  set seed 11  x1  lt - rnorm 15  x2  lt - sample c rep NA 5   rnorm 15    15  replace   TRUE  date  lt - Sys Date     cumsum sample 1 3  15  replace   TRUE    mean run x1    gt    1  -0 5910311 -0 2822184 -0 6936633 -0 8609108 -0 4530308 -0 5332176   gt    7  -0 2679571 -0 1563477 -0 1440561 -0 2300625 -0 2844599 -0 2897842   gt   13  -0 3858234 -0 3765192 -0 4280809  mean run x2  na rm   TRUE    gt    1  -0 18760011 -0 09022066 -0 06543317  0 03906450 -0 12188853 -0 13873536   gt    7  -0 13873536 -0 14571604 -0 12596067 -0 11116961 -0 09881996 -0 08871569   gt   13  -0 05194292 -0 04699909 -0 05704202  mean run x2  na rm   FALSE     gt    1  -0 18760011 -0 09022066 -0 06543317  0 03906450 -0 12188853 -0 13873536   gt    7           NA          NA          NA          NA          NA          NA   gt   13           NA          NA          NA  mean run x2  na rm   TRUE  k   4    gt    1  -0 18760011 -0 09022066 -0 06543317  0 03906450 -0 10546063 -0 16299272   gt    7  -0 21203756 -0 39209010 -0 13274756 -0 05603811 -0 03894684  0 01103493   gt   13   0 09609256  0 09738460  0 04740283  mean run x2  na rm   TRUE  k   4  idx   date    gt   1  -0 187600111 -0 090220655 -0 004349696  0 168349653 -0 206571573 -0 494335093   gt   7  -0 222969541 -0 187600111 -0 087636571  0 009742884  0 009742884  0 012326968   gt   13   0 182442234  0 125737145  0 059094786   One can also specify other options like lag  and roll only at specific indexes  More in package and function documentation

User · Answer

The caTools package has very fast rolling mean min max sd and few other functions  I ve only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date

[r] Calculating moving average

The answer is

Examples related to r

Examples related to moving-average

Examples related to r-faq

Tags