How can I use the apply function for a single column

Question

I have a pandas data frame with two columns  I need to change the values of the first column without affecting the second one and get back the whole data frame with just first column values changed  How can I do that using apply in pandas

User · Answer

You don t need a function at all  You can work on a whole column directly   Example data    gt  gt  gt  df   pd DataFrame   a    100  1000    b    200  2000    c    300  3000     gt  gt  gt  df        a     b     c 0   100   200   300 1  1000  2000  3000   Half all the values in column a    gt  gt  gt  df a   df a   2  gt  gt  gt  df       a     b     c 0   50   200   300 1  500  2000  3000

User · Answer

Given the following dataframe df and the function complex function    import pandas as pd    def complex function x  y 0         if x  gt  5 and x  gt  y            return 1       else            return 2    df   pd DataFrame data   col1    1  4  6  2  7    col2    6  7  1  2  8          col1  col2   0     1     6   1     4     7   2     6     1   3     2     2   4     7     8  there are several solutions to use apply   on only one column  In the following I will explain them in detail  I  Simple solution The straightforward solution is the one from  Fabio Lamanna    df  col1     df  col1   apply complex function   Output       col1  col2   0     2     6   1     2     7   2     1     1   3     2     2   4     1     8  Only the first column is modified  the second column is unchanged  The solution is beautiful  It is just one line of code and it reads almost like english   quot Take  col1  and apply the function complex function to it  quot  However  if you need data from another column  e g   col2   it s not working  If you want to pass the values of  col2  to variable y of the complex function  you need something else  II  Solution using the whole dataframe Alternatively  you could use the whole dataframe as described in this or this SO post    df  col1     df apply lambda x  complex function x  col1     axis 1   or if you prefer  like me  a solution without a lambda function    def apply complex function x   return complex function x  col1      df  col1     df apply apply complex function  axis 1    There is a lot going on in this solution that needs to be explained  The apply   function works on pd Series and pd DataFrame  But you cannot use df  col1     df apply complex function  loc     col1    because it would throw a ValueError  Hence  you need to give the information which column to use  To complicate things  the apply   function does only accept callables  To solve this  you need to define a  lambda  function with the column x  col1   as argument  i e  we wrap the column information in another function  Unfortunately  the default value of the axis parameter is zero  axis 0   which means it will try executing column-wise and not row-wise  This wasn t a problem in the first solution  because we gave apply   a pd Series  But now the input is a dataframe and we must be explicit  axis 1    I marvel how often I forget this   Whether you prefer the version with the lambda function or without is subjective  In my opinion the line of code is complicated enough to read even without a lambda function thrown in  You only need the  lambda  function as a wrapper  It is just boiler code  A reader should not be bothered with it  Now  you can modify this solution easily to take the second column into account      def apply complex function x   return complex function x  col1    x  col2        df  col1     df apply apply complex function  axis 1   Output       col1  col2   0     2     6   1     2     7   2     1     1   3     2     2   4     2     8  At index 4 the value has changed from 1 to 2  because the first condition  7  gt  5 is true but the second condition 7  gt  8 is false  Note that you only needed to change the first line of code  i e  the function  and not the second line   Side note Never put the column information into your function    def bad idea x         return x  col1      2  By doing this  you make a general function dependent on a column name  This is a bad idea  because the next time you want to use this function  you cannot  Worse  Maybe you rename a column in a different dataframe just to make it work with your existing function   Been there  done that  It is a slippery slope    III  Alternative solutions without using apply   Although the OP specifically asked for a solution with apply    alternative solutions were suggested  For example  the answer of  George Petrov suggested to use map    the answer of  Thibaut Dubernet proposed assign    I fully agree that apply   is seldom the best solution  because apply   is not vectorized  It is an element-wise operation with expensive function calling and overhead from pd Series  One reason to use apply   is that you want to use an existing function and performance is not an issue  Or your function is so complex that no vectorized version exists  Another reason to use apply   is in combination with groupby    Please note that DataFrame apply   and GroupBy apply   are different functions  So it does make sense to consider some alternatives   map   only works on pd Series  but accepts dict and pd Series as input  Using map   with a function is almost interchangeable with using apply    It can be faster than apply    See this SO post for more details     df  col1     df  col1   map complex function    applymap   is almost identical for dataframes  It does not support pd Series and it will always return a dataframe  However  it can be faster  The documentation states   quot In the current implementation applymap calls func twice on the first column row to decide whether it can take a fast or slow code path  quot   But if performance really counts you should seek an alternative route     df  col1     df applymap complex function  loc     col1     assign   is not a feasible replacement for apply    It has a similar behaviour in only the most basic use cases  It does not work with the complex function  You still need apply   as you can see in the example below  The main use case for assign   is method chaining  because it gives back the dataframe without changing the original dataframe     df  col1     df assign col1 df col1 apply complex function     Annex  How to speed up apply  I only mention it here because it was suggested by other answers  e g   durjoy  The list is not exhaustive   Do not use apply    This is no joke  For most numeric operations  a vectorized method exists in pandas  If else blocks can often be refactored with a combination of boolean indexing and  loc  My example complex function could be refactored in this way  Refactor to Cython  If you have a complex equation and the parameters of the equation are in your dataframe  this might be a good idea  Check out the official pandas user guide for more information  Use raw True parameter  Theoretically  this should improve the performance of apply   if you are just applying a NumPy reduction function  because the overhead of pd Series is removed  Of course  your function has to accept an ndarray  You have to refactor your function to NumPy  By doing this  you will have a huge performance boost  Use 3rd party packages  The first thing you should try is Numba  I do not know swifter mentioned by  durjoy  and probably many other packages are worth mentioning here  Try Fail Repeat  As mentioned above  map   and applymap   can be faster - depending on the use case  Just time the different versions and choose the fastest  This approach is the most tedious one with the least performance increase

User · Answer

Let me try a complex computation using datetime and considering nulls or empty spaces  I am reducing 30 years on a datetime column and using apply method as well as lambda and converting datetime format  Line if x       else x will take care of all empty spaces or nulls accordingly    df  Date     df  Date   fillna     df  Date     df  Date   apply lambda x     datetime datetime strptime str x     m  d  Y   - datetime timedelta days 30 365   strftime   Y m d    if x       else x

User · Answer

Although the given responses are correct  they modify the initial data frame  which is not always desirable  and  given the OP asked for examples  using apply   it might be they wanted a version that returns a new data frame  as apply does    This is possible using assign  it is valid to assign to existing columns  as the documentation states  emphasis is mine       Assign new columns to a DataFrame       Returns a new object with all original columns in addition to new ones  Existing columns that are re-assigned will be overwritten    In short   In  1   import pandas as pd  In  2   df   pd DataFrame    a   15   b   15   c   5     a   20   b   10   c   7     a   25   b   30   c   9     In  3   df assign a lambda df  df a   2  Out 3          a   b  c 0   7 5  15  5 1  10 0  10  7 2  12 5  30  9  In  4   df Out 4        a   b  c 0  15  15  5 1  20  10  7 2  25  30  9   Note that the function will be passed the whole dataframe  not only the column you want to modify  so you will need to make sure you select the right column in your lambda

User · Answer

If you are really concerned about the execution speed of your apply function and you have a huge dataset to work on  you could use swifter to make faster execution  here is an example for swifter on pandas dataframe   import pandas as pd import swifter  def fnc m       return m 3 4  df   pd DataFrame   m    1 2 3 4 5 6    c    1 1 1 1 1 1    x   5 3 6 2 6 1       apply a self created function to a single column in pandas df  y     df m swifter apply fnc    This will enable your all CPU cores to compute the result hence it will be much faster than normal apply functions  Try and let me know if it become useful for you

User · Answer

Given a sample dataframe df as   a b 1 2 2 3 3 4 4 5   what you want is   df  a     df  a   apply lambda x  x   1    that returns      a  b 0  2  2 1  3  3 2  4  4 3  5  5

User · Answer

For a single column better to use map    like this   df   pd DataFrame    a   15   b   15   c   5     a   20   b   10   c   7     a   25   b   30   c   9         a   b  c 0  15  15  5 1  20  10  7 2  25  30  9    df  a     df  a   map lambda a  a   2          a   b  c 0   7 5  15  5 1  10 0  10  7 2  12 5  30  9

[python] How can I use the apply() function for a single column?

Examples related to python

Examples related to pandas

Examples related to dataframe