summing two columns in a pandas dataframe

Question

when I use this syntax it creates a series rather than adding a column to my new dataframe  sum   Please help   My code    sum   data  variance     data budget   data actual   My Data  in dataframe df    currently has everything except the budget - actual  I want to create a variance column       cluster     date    budget  actual            budget - actual 0   a   2014-01-01 00 00 00     11000   10000       1000 1   a   2014-02-01 00 00 00     1200    1000 2   a   2014-03-01 00 00 00     200     100 3   b   2014-04-01 00 00 00     200     300 4   b   2014-05-01 00 00 00     400     450 5   c   2014-06-01 00 00 00     700     1000 6   c   2014-07-01 00 00 00     1200    1000 7   c   2014-08-01 00 00 00     200     100 8   c   2014-09-01 00 00 00     200     300

User · Answer

df  variance     df loc     budget   actual    sum axis 1

User · Answer

If  budget  has any NaN values but you don t want it to sum to NaN then try   def fun  b  a       if math isnan b           return a     else          return b   a  f   np vectorize fun  otypes  float    df  variance     f df  budget    df Lp  actual

User · Answer

I think you ve misunderstood some python syntax  the following does two assignments   In  11   a   b   1  In  12   a Out 12   1  In  13   b Out 13   1   So in your code it was as if you were doing   sum   df  budget     df  actual       a Series   and df  variance     df  budget     df  actual      assigned to a column   The latter creates a new column for df   In  21   df Out 21     cluster                 date  budget  actual 0       a  2014-01-01 00 00 00   11000   10000 1       a  2014-02-01 00 00 00    1200    1000 2       a  2014-03-01 00 00 00     200     100 3       b  2014-04-01 00 00 00     200     300 4       b  2014-05-01 00 00 00     400     450 5       c  2014-06-01 00 00 00     700    1000 6       c  2014-07-01 00 00 00    1200    1000 7       c  2014-08-01 00 00 00     200     100 8       c  2014-09-01 00 00 00     200     300  In  22   df  variance     df  budget     df  actual    In  23   df Out 23     cluster                 date  budget  actual  variance 0       a  2014-01-01 00 00 00   11000   10000     21000 1       a  2014-02-01 00 00 00    1200    1000      2200 2       a  2014-03-01 00 00 00     200     100       300 3       b  2014-04-01 00 00 00     200     300       500 4       b  2014-05-01 00 00 00     400     450       850 5       c  2014-06-01 00 00 00     700    1000      1700 6       c  2014-07-01 00 00 00    1200    1000      2200 7       c  2014-08-01 00 00 00     200     100       300 8       c  2014-09-01 00 00 00     200     300       500   As an aside  you shouldn t use sum as a variable name as the overrides the built-in sum function

User · Answer

You could also use the  add   function    df loc    variance     df loc    budget   add df loc    actual

User · Answer

Same think can be done using lambda function  Here I am reading the data from a xlsx file   import pandas as pd df   pd read excel  data xlsx   sheet name   4  print df   Output     cluster Unnamed  1      date  budget  actual 0       a 2014-01-01  00 00 00   11000   10000 1       a 2014-02-01  00 00 00    1200    1000 2       a 2014-03-01  00 00 00     200     100 3       b 2014-04-01  00 00 00     200     300 4       b 2014-05-01  00 00 00     400     450 5       c 2014-06-01  00 00 00     700    1000 6       c 2014-07-01  00 00 00    1200    1000 7       c 2014-08-01  00 00 00     200     100 8       c 2014-09-01  00 00 00     200     300   Sum two columns into 3rd new one   df  variance     df apply lambda x  x  budget     x  actual    axis 1  print df   Output     cluster Unnamed  1      date  budget  actual  variance 0       a 2014-01-01  00 00 00   11000   10000     21000 1       a 2014-02-01  00 00 00    1200    1000      2200 2       a 2014-03-01  00 00 00     200     100       300 3       b 2014-04-01  00 00 00     200     300       500 4       b 2014-05-01  00 00 00     400     450       850 5       c 2014-06-01  00 00 00     700    1000      1700 6       c 2014-07-01  00 00 00    1200    1000      2200 7       c 2014-08-01  00 00 00     200     100       300 8       c 2014-09-01  00 00 00     200     300       500

[python] summing two columns in a pandas dataframe

Examples related to python

Examples related to pandas