Why isn t my Pandas apply function referencing multiple columns working

Question

I have some problems with the Pandas apply function  when using multiple columns with the following dataframe  df   DataFrame    a    np random randn 6                     b      foo    bar     3                    c    np random randn 6      and the following function  def my test a  b       return a   b   When I try to apply this function with    df  Value     df apply lambda row  my test row a   row c    axis 1    I get the error message   NameError    global name  a  is not defined   u occurred at index 0     I do not understand this message  I defined the name properly    I would highly appreciate any help on this issue  Update  Thanks for your help  I made indeed some syntax mistakes with the code  the index should be put     However I still get the same issue using a more complex function such as   def my test a       cum diff   0     for ix in df index            cum diff   cum diff    a - df  a   ix       return cum diff

User · Answer

All of the suggestions above work  but if you want your computations to by more efficient  you should take advantage of numpy vector operations  as pointed out here    import pandas as pd import numpy as np   df   pd DataFrame    a    np random randn 6                 b      foo    bar     3                c    np random randn 6      Example 1   looping with pandas apply       timeit def my test2 row       return row  a     row  c    df  Value     df apply my test2  axis 1       The slowest run took 7 49 times longer than the fastest  This could   mean that an intermediate result is being cached  1000 loops  best of   3  481   s per loop   Example 2   vectorize using pandas apply       timeit df  a     df  c        The slowest run took 458 85 times longer than the fastest  This could   mean that an intermediate result is being cached  10000 loops  best of   3  70 9   s per loop   Example 3   vectorize using numpy arrays     timeit df  a   values   df  c   values      The slowest run took 7 98 times longer than the fastest  This could   mean that an intermediate result is being cached  100000 loops  best   of 3  6 39   s per loop   So vectorizing using numpy arrays improved the speed by almost two orders of magnitude

User · Answer

This is same as the previous solution but I have defined the function in df apply itself   df  Value     df apply lambda row  row  a   row  c    axis 1

User · Answer

If you just want to compute  column a     column b   you don t need apply  just do it directly   In  7   df  a     df  c                                                                                                                                                           Out 7    0   -1 132022                                                                                                                                                                     1   -0 939493                                                                                                                                                                     2    0 201931                                                                                                                                                                     3    0 511374                                                                                                                                                                     4   -0 694647                                                                                                                                                                     5   -0 023486                                                                                                                                                                     Name  a

User · Answer

Let s say we want to apply a function add5 to columns  a  and  b  of DataFrame df  def add5 x       return x 5  df   a    b    apply add5

User · Answer

Seems you forgot the    of your string   In  43   df  Value     df apply lambda row  my test row  a    row  c     axis 1   In  44   df Out 44                       a    b         c     Value           0 -1 674308  foo  0 343801  0 044698           1 -2 163236  bar -2 046438 -0 116798           2 -0 199115  foo -0 458050 -0 199115           3  0 918646  bar -0 007185 -0 001006           4  1 336830  foo  0 534292  0 268245           5  0 976844  bar -0 773630 -0 570417   BTW  in my opinion  following way is more elegant   In  53   def my test2 row             return row  a     row  c               In  54   df  Value     df apply my test2  axis 1

User · Answer

I have given the comparison of all three discussed above   Using values   timeit df  value     df  a   values   df  c   values  139   s    1 91   s per loop  mean    std  dev  of 7 runs  10000 loops each   Without values   timeit df  value     df  a   df  c     216   s    1 86   s per loop  mean    std  dev  of 7 runs  1000 loops each   Apply function   timeit df  Value     df apply lambda row  row  a   row  c    axis 1   474   s    5 07   s per loop  mean    std  dev  of 7 runs  1000 loops each

[python] Why isn't my Pandas 'apply' function referencing multiple columns working?

Examples related to python

Examples related to python-2.7

Examples related to pandas

Examples related to dataframe

Examples related to apply