How to apply a function to two columns of Pandas dataframe

Question

Suppose I have a df which has columns of  ID    col 1    col 2   And I define a function    f   lambda x  y   my function expression   Now I want to apply the f to df s two columns  col 1    col 2  to element-wise calculate a new column  col 3    somewhat like    df  col 3     df   col 1   col 2    apply f      Pandas gives   TypeError     lt lambda gt    takes exactly 2 arguments  1 given     How to do       Add detail sample as below      import pandas as pd  df   pd DataFrame   ID    1   2   3     col 1    0 2 3    col 2   1 4 5    mylist     a   b   c   d   e   f    def get sublist sta end       return mylist sta end 1    df  col 3     df   col 1   col 2    apply get sublist axis 1    expect above to output df as below     ID  col 1  col 2            col 3 0  1      0      1         a    b   1  2      2      4    c    d    e   2  3      3      5    d    e    f

User · Answer

The way you have written f it needs two inputs. If you look at the error message it says you are not providing two inputs to f, just one. The error message is correct.
The mismatch is because df[['col1','col2']] returns a single dataframe with two columns, not two separate columns.

You need to change your f so that it takes a single input, keep the above data frame as input, then break it up into x,y inside the function body. Then do whatever you need and return a single value.

You need this function signature because the syntax is .apply(f) So f needs to take the single thing = dataframe and not two things which is what your current f expects.

Since you haven't provided the body of f I can't help in anymore detail - but this should provide the way out without fundamentally changing your code or using some other methods rather than apply

User · Answer

If you have a huge data-set  then you can use an easy but faster execution time  way of doing this using swifter    import pandas as pd import swifter  def fnc m x c       return m x c  df   pd DataFrame   m    1 2 3 4 5 6    c    1 1 1 1 1 1    x   5 3 6 2 6 1    df  y     df swifter apply lambda x  fnc x m  x x  x c   axis 1

User · Answer

Here s an example using apply on the dataframe  which I am calling with axis   1    Note the difference is that instead of trying to pass two values to the function f  rewrite the function to accept a pandas Series object  and then index the Series to get the values needed    In  49   df Out 49              0         1 0  1 000000  0 000000 1 -0 494375  0 570994 2  1 000000  0 000000 3  1 876360 -0 229738 4  1 000000  0 000000  In  50   def f x                 return x 0    x 1                In  51   df apply f  axis 1   passes a Series object  row-wise Out 51    0    1 000000 1    0 076619 2    1 000000 3    1 646622 4    1 000000   Depending on your use case  it is sometimes helpful to create a pandas group object  and then use apply on the group

User · Answer

Returning a list from apply is a dangerous operation as the resulting object is not guaranteed to be either a Series or a DataFrame  And exceptions might be raised in certain cases  Let s walk through a simple example   df   pd DataFrame data np random randint 0  5   5 3                      columns   a    b    c    df    a  b  c 0  4  0  0 1  2  0  1 2  2  2  2 3  1  2  2 4  3  0  0   There are three possible outcomes with returning a list from apply  1  If the length of the returned list is not equal to the number of columns  then a Series of lists is returned   df apply lambda x  list range 2    axis 1     returns a Series 0     0  1  1     0  1  2     0  1  3     0  1  4     0  1  dtype  object   2  When the length of the returned list is equal to the number of    columns then a DataFrame is returned and each column gets the    corresponding value in the list   df apply lambda x  list range 3    axis 1    returns a DataFrame    a  b  c 0  0  1  2 1  0  1  2 2  0  1  2 3  0  1  2 4  0  1  2   3  If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised   i   0 def f x       global i     if i    0          i    1         return list range 3       return list range 4    df apply f  axis 1   ValueError  Shape of passed values is  5  4   indices imply  5  3    Answering the problem without apply  Using apply with axis 1 is very slow  It is possible to get much better performance  especially on larger datasets  with basic iterative methods   Create larger dataframe    df1   df sample 100000  replace True  reset index drop True    Timings    apply is slow with axis 1  timeit df1 apply lambda x  mylist x  col 1    x  col 2   1   axis 1  2 59 s    76 8 ms per loop  mean    std  dev  of 7 runs  1 loop each     zip - similar to  Thomas  timeit  mylist v1 v2 1  for v1  v2 in zip df1 col 1  df1 col 2     29 5 ms    534   s per loop  mean    std  dev  of 7 runs  10 loops each     Thomas answer   timeit list map get sublist  df1  col 1   df1  col 2     34 ms    459   s per loop  mean    std  dev  of 7 runs  10 loops each

User · Answer

I m going to put in a vote for np vectorize  It allows you to just shoot over x number of columns and not deal with the dataframe in the function  so it s great for functions you don t control or doing something like sending 2 columns and a constant into a function  i e  col 1  col 2   foo     import numpy as np import pandas as pd  df   pd DataFrame   ID    1   2   3     col 1    0 2 3    col 2   1 4 5    mylist     a   b   c   d   e   f    def get sublist sta end       return mylist sta end 1    df  col 3     df   col 1   col 2    apply get sublist axis 1    expect above to output df as below   df loc    col 3     np vectorize get sublist  otypes   O     df  col 1    df  col 2      df  ID  col 1   col 2   col 3 0   1   0   1    a  b  1   2   2   4    c  d  e  2   3   3   5    d  e  f

User · Answer

My example to your questions   def get sublist row  col1  col2       return mylist row col1  row col2  1  df apply get sublist  axis 1  col1  col 1   col2  col 2

User · Answer

A interesting question  my answer as below   import pandas as pd  def sublst row       return lst row  J1   row  J2     df   pd DataFrame   ID    1   2   3     J1    0 2 3    J2   1 4 5    print df lst     a   b   c   d   e   f    df  J3     df apply sublst axis 1  print df   Output     ID  J1  J2 0  1   0   1 1  2   2   4 2  3   3   5   ID  J1  J2      J3 0  1   0   1      a  1  2   2   4   c  d  2  3   3   5   d  e    I changed the column name to ID J1 J2 J3 to ensure ID  lt  J1  lt  J2  lt  J3  so the column display in right sequence   One more brief version   import pandas as pd  df   pd DataFrame   ID    1   2   3     J1    0 2 3    J2   1 4 5    print df lst     a   b   c   d   e   f    df  J3     df apply lambda row lst row  J1   row  J2    axis 1  print df

User · Answer

I m sure this isn t as fast as the solutions using Pandas or Numpy operations  but if you don t want to rewrite your function you can use map   Using the original example data -  import pandas as pd  df   pd DataFrame   ID    1   2   3     col 1    0 2 3    col 2   1 4 5    mylist     a   b   c   d   e   f    def get sublist sta end       return mylist sta end 1   df  col 3     list map get sublist df  col 1   df  col 2      In Python 2 don t convert above to list   We could pass as many arguments as we wanted into the function this way   The output is what we wanted  ID  col 1  col 2      col 3 0  1      0      1      a  b  1  2      2      4   c  d  e  2  3      3      5   d  e  f

User · Answer

There is a clean  one-line way of doing this in Pandas   df  col 3     df apply lambda x  f x col 1  x col 2   axis 1    This allows f to be a user-defined function with multiple input values  and uses  safe  column names rather than  unsafe  numeric indices to access the columns   Example with data  based on original question    import pandas as pd  df   pd DataFrame   ID    1    2    3     col 1    0  2  3    col 2   1  4  5    mylist     a    b    c    d    e    f    def get sublist sta end       return mylist sta end 1   df  col 3     df apply lambda x  get sublist x col 1  x col 2   axis 1    Output of print df      ID  col 1  col 2      col 3 0  1      0      1      a  b  1  2      2      4   c  d  e  2  3      3      5   d  e  f    If your column names contain spaces or share a name with an existing dataframe attribute  you can index with square brackets   df  col 3     df apply lambda x  f x  col 1    x  col 2     axis 1

User · Answer

A simple solution is   df  col 3     df   col 1   col 2    apply lambda x  f  x   axis 1

User · Answer

I suppose you don t want to change get sublist function  and just want to use DataFrame s apply method to do the job  To get the result you want  I ve wrote two help functions  get sublist list and unlist  As the function name suggest  first get the list of sublist  second extract that sublist from that list  Finally  We need to call apply function to apply those two functions to the df   col 1   col 2    DataFrame subsequently   import pandas as pd  df   pd DataFrame   ID    1   2   3     col 1    0 2 3    col 2   1 4 5    mylist     a   b   c   d   e   f    def get sublist sta end       return mylist sta end 1   def get sublist list cols       return  get sublist cols 0  cols 1     def unlist list of lists       return list of lists 0   df  col 3     df   col 1   col 2    apply get sublist list axis 1  apply unlist   df   If you don t use    to enclose the get sublist function  then the get sublist list function will return a plain list  it ll raise ValueError  could not broadcast input array from shape  3  into shape  2   as  Ted Petrou had mentioned

User · Answer

The method you are looking for is Series combine   However  it seems some care has to be taken around datatypes   In your example  you would  as I did when testing the answer  naively call   df  col 3     df col 1 combine df col 2  func get sublist    However  this throws the error    ValueError  setting an array element with a sequence    My best guess is that it seems to expect the result to be of the same type as the series calling the method  df col 1 here   However  the following works   df  col 3     df col 1 astype object  combine df col 2  func get sublist   df     ID   col 1   col 2   col 3 0   1   0   1    a  b  1   2   2   4    c  d  e  2   3   3   5    d  e  f

[python] How to apply a function to two columns of Pandas dataframe

Examples related to python

Examples related to pandas

Examples related to dataframe