Is there a way in Pandas to use previous row value in dataframe apply when previous value is also calculated in the apply

Question

I have the following dataframe    Index Date    A    B    C    D                                   2015-01-31    10   10   Nan  10  2015-02-01     2    3   Nan  22   2015-02-02    10   60   Nan  280  2015-02-03    10   100   Nan  250   Require    Index Date    A    B    C    D                                   2015-01-31    10   10   10   10  2015-02-01     2    3   23   22  2015-02-02    10   60   290  280  2015-02-03    10   100  3000 250   Column C is derived for 2015-01-31 by taking value of D   Then I need to use the value of C for 2015-01-31 and multiply by the value of A on 2015-02-01 and add B   I have attempted an apply and a shift using an if else by this gives a key error

User · Answer

Given a column of numbers   lst      cols     A   for a in range 100  105       lst append  a   df   pd DataFrame lst  columns cols  index range 5   df      A 0   100 1   101 2   102 3   103 4   104   You can reference the previous row with shift   df  Change     df A - df A shift 1  df      A   Change 0   100 NaN 1   101 1 0 2   102 1 0 3   103 1 0 4   104 1 0

User · Answer

First  create the derived value   df loc 0   C     df loc 0   D     Then iterate through the remaining rows and fill the calculated values    for i in range 1  len df        df loc i   C     df loc i-1   C     df loc i   A     df loc i   B       Index Date   A   B    C    D 0 2015-01-31  10  10   10   10 1 2015-02-01   2   3   23   22 2 2015-02-02  10  60  290  280

User · Answer

Applying the recursive function on numpy arrays will be faster than the current answer   df   pd DataFrame np repeat np arange 2  6  3  reshape 4 3   columns   A    B    D    new    df D values 0   for i in range 1  len df index        new append new i-1  df A values i  df B values i   df  C     new   Output        A  B  D    C    0  1  1  1    1    1  2  2  2    4    2  3  3  3   15    3  4  4  4   64    4  5  5  5  325

User · Answer

Although it has been a while since this question was asked  I will post my answer hoping it helps somebody   Disclaimer  I know this solution is not standard  but I think it works well   import pandas as pd import numpy as np  data   np array   10  2  10  10                     10  3  60  100                     np nan    4                    10  22  280  250    T idx   pd date range  20150131   end  20150203   df   pd DataFrame data data  columns list  ABCD    index idx  df                A    B     C    D                                     2015-01-31    10   10    NaN  10  2015-02-01    2    3     NaN  22   2015-02-02    10   60    NaN  280  2015-02-03    10   100   NaN  250  def calculate mul  add       global value     value   value   mul   add     return value  value   df loc  2015-01-31    D   df loc  2015-01-31    C     value df loc  2015-02-01     C     df loc  2015-02-01    apply lambda row  calculate  row   A    B      axis 1  df                A    B     C     D                                     2015-01-31    10   10    10    10  2015-02-01    2    3     23    22   2015-02-02    10   60    290   280  2015-02-03    10   100   3000  250   So basically we use a apply from pandas and the help of a global variable that keeps track of the previous calculated value     Time comparison with a for loop   data   np random random size  1000  4   idx   pd date range  20150131   end  20171026   df   pd DataFrame data data  columns list  ABCD    index idx  df C   np nan  df loc  2015-01-31    C     df loc  2015-01-31    D      timeit for i in df loc  2015-02-01    index date      df loc i   C     df loc  i - pd DateOffset days 1   date     C     df loc i   A     df loc i   B     3 2 s    114 ms per loop  mean    std  dev  of 7 runs  1 loop each   data   np random random size  1000  4   idx   pd date range  20150131   end  20171026   df   pd DataFrame data data  columns list  ABCD    index idx  df C   np nan  def calculate mul  add       global value     value   value   mul   add     return value  value   df loc  2015-01-31    D   df loc  2015-01-31    C     value    timeit df loc  2015-02-01     C     df loc  2015-02-01    apply lambda row  calculate  row   A    B      axis 1    1 82 s    64 4 ms per loop  mean    std  dev  of 7 runs  1 loop each   So 0 57 times faster on average

User · Answer

In general  the key to avoiding an explicit loop would be to join  merge  2 instances of the dataframe on rowindex-1  rowindex  Then you would have a big dataframe containing rows of r and r-1  from where you could do a df apply   function  However the overhead of creating the large dataset may offset the benefits of parallel processing

User · Answer

It s an old question but the solution below  without a for loop  might be helpful  def new fun df       prev value   df iloc 0   quot C quot       def func2 row             non local variable    gt  will use pre value from the new fun function         nonlocal prev value         new value    prev value   row  A     row  B           prev value   row  C           return new value       This line might throw a SettingWithCopyWarning warning     df iloc 1    quot C quot     df iloc 1   apply func2  axis 1      return df  df   new fun df

User · Answer

numba  For recursive calculations which are not vectorisable  numba  which uses JIT-compilation and works with lower level objects  often yields large performance improvements  You need only define a regular for loop and use the decorator  njit or  for older versions   jit nopython True    For a reasonable size dataframe  this gives a  30x performance improvement versus a regular for loop   from numba import jit   jit nopython True  def calculator nb a  b  d       res   np empty d shape      res 0    d 0      for i in range 1  res shape 0            res i    res i-1    a i    b i      return res  df  C     calculator nb  df list  ABD    values T   n   10  5 df   pd concat  df  n  ignore index True     benchmarking on Python 3 6 0  Pandas 0 19 2  NumPy 1 11 3  Numba 0 30 1   calculator   is same as calculator nb   but without  jit decorator  timeit calculator nb  df list  ABD    values T     14 1 ms per loop  timeit calculator  df list  ABD    values T        444 ms per loop

[python] Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to for-loop

Examples related to iteration