Difference between map applymap and apply methods in Pandas

Question

Can you tell me when to use these vectorization methods with basic examples    I see that map is a Series method whereas the rest are DataFrame methods  I got confused about apply and applymap methods though  Why do we have two methods for applying a function to a DataFrame  Again  simple examples which illustrate the usage would be great

User · Answer

Based on the answer of cs95   map is defined on Series ONLY applymap is defined on DataFrames ONLY apply is defined on BOTH   give some examples  In  3   frame   pd DataFrame np random randn 4  3   columns list  bde    index   Utah    Ohio    Texas    Oregon     In  4   frame Out 4               b         d         e Utah    0 129885 -0 475957 -0 207679 Ohio   -2 978331 -1 015918  0 784675 Texas  -0 256689 -0 226366  2 262588 Oregon  2 605526  1 139105 -0 927518  In  5   myformat lambda x  f  x  2f    In  6   frame d map myformat  Out 6   Utah      -0 48 Ohio      -1 02 Texas     -0 23 Oregon     1 14 Name  d  dtype  object  In  7   frame d apply myformat  Out 7   Utah      -0 48 Ohio      -1 02 Texas     -0 23 Oregon     1 14 Name  d  dtype  object  In  8   frame applymap myformat  Out 8               b      d      e Utah     0 13  -0 48  -0 21 Ohio    -2 98  -1 02   0 78 Texas   -0 26  -0 23   2 26 Oregon   2 61   1 14  -0 93  In  9   frame apply lambda x  x apply myformat   Out 9               b      d      e Utah     0 13  -0 48  -0 21 Ohio    -2 98  -1 02   0 78 Texas   -0 26  -0 23   2 26 Oregon   2 61   1 14  -0 93   In  10   myfunc lambda x  x  2  In  11   frame applymap myfunc  Out 11               b         d         e Utah    0 016870  0 226535  0 043131 Ohio    8 870453  1 032089  0 615714 Texas   0 065889  0 051242  5 119305 Oregon  6 788766  1 297560  0 860289  In  12   frame apply myfunc  Out 12               b         d         e Utah    0 016870  0 226535  0 043131 Ohio    8 870453  1 032089  0 615714 Texas   0 065889  0 051242  5 119305 Oregon  6 788766  1 297560  0 860289

User · Answer

My understanding   From the function point of view   If the function has variables that need to compare within a column  row  use  apply    e g   lambda x  x max  -x mean     If the function is to be applied to each element   1  If a column row is located  use apply  2  If apply to entire dataframe  use applymap  majority   lambda x   x  gt  17 df2  legal drinker     df2  age   apply majority   def times10 x     if type x  is int      x    10    return x df2 applymap times10

User · Answer

Just for additional context and intuition  here s an explicit and concrete example of the differences  Assume you have the following function seen below    This label function  will arbitrarily split the values into  High  and  Low   based upon the threshold you provide as the parameter  x     def label element  x       if element  gt  x          return  High      else          return  Low    In this example  lets assume our dataframe has one column with random numbers   If you tried mapping the label function with map  df  ColumnName   map label  x   0 8   You will result with the following error  TypeError  map   got an unexpected keyword argument  x   Now take the same function and use apply  and you ll see that it works  df  ColumnName   apply label  x 0 8   Series apply   can take additional arguments element-wise  while the Series map   method will return an error  Now  if you re trying to apply the same function to several columns in your dataframe simultaneously  DataFrame applymap   is used  df   ColumnName   ColumnName2   ColumnName3   ColumnName4    applymap label   Lastly  you can also use the apply   method on a dataframe  but the DataFrame apply   method has different capabilities  Instead of applying functions element-wise  the df apply   method applies functions along an axis  either column-wise or row-wise  When we create a function to use with df apply    we set it up to accept a series  most commonly a column  Here is an example  df apply pd value counts   When we applied the pd value counts function to the dataframe  it calculated the value counts for all the columns  Notice  and this is very important  when we used the df apply   method to transform multiple columns  This is only possible because the pd value counts function operates on a series  If we tried to use the df apply   method to apply a function that works element-wise to multiple columns  we d get an error  For example  def label element       if element  gt  1          return  High      else          return  Low   df   ColumnName   ColumnName2   ColumnName3   ColumnName4    apply label    This will result with the following error  ValueError    The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all      u occurred at index Economy    In general  we should only use the apply   method when a vectorized function does not exist  Recall that pandas uses vectorization  the process of applying operations to whole series at once  to optimize performance  When we use the apply   method  we re actually looping through rows  so a vectorized method can perform an equivalent task faster than the apply   method   Here are some examples of vectorized functions that already exist that you do NOT want to recreate using any type of apply map methods   Series str split     Splits each element in the Series Series str strip     Strips whitespace from each string in the Series  Series str lower     Converts strings in the Series to lowercase  Series str upper     Converts strings in the Series to uppercase  Series str get   Retrieves the ith element of each element in the Series  Series str replace   Replaces a regex or string in the Series with another string Series str cat   Concatenates strings in a Series  Series str extract   Extracts substrings from the Series matching a regex pattern

User · Answer

Adding to the other answers  in a Series there are also map and apply    Apply can make a DataFrame out of a series  however  map will just put a series in every cell of another series  which is probably not what you want   In  40   p pd Series  1 2 3   In  41   p Out 31   0    1 1    2 2    3 dtype  int64  In  42   p apply lambda x  pd Series  x  x    Out 42       0  1 0  1  1 1  2  2 2  3  3  In  43   p map lambda x  pd Series  x  x    Out 43    0    0    1 1    1 dtype  int64 1    0    2 1    2 dtype  int64 2    0    3 1    3 dtype  int64 dtype  object   Also if I had a function with side effects  such as  connect to a web server   I d probably use apply just for the sake of clarity   series apply download file for every element     Map can use not only a function  but also a dictionary or another series  Let s say you want to manipulate permutations   Take  1 2 3 4 5 2 1 4 5 3   The square of this permutation is  1 2 3 4 5 1 2 5 3 4   You can compute it using map  Not sure if self-application is documented  but it works in 0 15 1    In  39   p pd Series  1 0 3 4 2    In  40   p map p  Out 40    0    0 1    1 2    4 3    2 4    3 dtype  int64

User · Answer

Quick Summary  DataFrame apply operates on entire rows or columns at a time   DataFrame applymap  Series apply  and Series map operate on one element at time    Series apply and Series map are similar and often interchangeable   Some of their slight differences are discussed in osa s answer below

User · Answer

jeremiahbuddha mentioned that apply works on row columns  while applymap works element-wise  But it seems you can still use apply for element-wise computation     frame apply np sqrt  Out 102                   b         d         e Utah         NaN  1 435159       NaN Ohio    1 098164  0 510594  0 729748 Texas        NaN  0 456436  0 697337 Oregon  0 359079       NaN       NaN  frame applymap np sqrt  Out 103                   b         d         e Utah         NaN  1 435159       NaN Ohio    1 098164  0 510594  0 729748 Texas        NaN  0 456436  0 697337 Oregon  0 359079       NaN       NaN

User · Answer

Probably simplest explanation the difference between apply and applymap   apply takes the whole column as a parameter and then assign the result to this column  applymap takes the separate cell value as a parameter and assign the result back to this cell   NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix

User · Answer

FOMO   The following example shows apply and applymap applied to a DataFrame   map function is something you do apply on Series only  You cannot apply map  on DataFrame   The thing to remember is that  apply can do anything applymap can  but apply has eXtra options   The X factor options are  axis and result type where result type only works when axis 1  for columns    df   DataFrame 1  columns list  abc                      index list  1234    print df   f   lambda x  np log x  print df applymap f     apply to the whole dataframe print np log df     applied to the whole dataframe print df applymap np sum     reducing can be applied for rows only    apply can take different options  vs  applymap cannot  print df apply f     same as applymap print df apply sum  axis 1      reducing example print df apply np log  axis 1     cannot reduce print df apply lambda x   1  2  3   axis 1  result type  expand      expand result   As a sidenote  Series map function  should not be confused with the Python map function   The first one is applied on Series  to map the values  and the second one to every item of an iterable     Lastly don t confuse the dataframe apply method with groupby apply method

User · Answer

Straight from Wes McKinney s Python for Data Analysis book  pg  132  I highly recommended this book       Another frequent operation is applying a function on 1D arrays to each column or row  DataFrame   s apply method does exactly this    In  116   frame   DataFrame np random randn 4  3   columns list  bde    index   Utah    Ohio    Texas    Oregon     In  117   frame Out 117                   b         d         e Utah   -0 029638  1 081563  1 280300 Ohio    0 647747  0 831136 -1 549481 Texas   0 513416 -0 884417  0 195343 Oregon -0 485454 -0 477388 -0 309548  In  118   f   lambda x  x max   - x min    In  119   frame apply f  Out 119    b    1 133201 d    1 965980 e    2 829781 dtype  float64      Many of the most common array statistics  like sum and mean  are DataFrame methods        so using apply is not necessary       Element-wise Python functions can be used  too  Suppose you wanted to compute a formatted string from each floating point value in frame  You can do this with applymap    In  120   format   lambda x     2f    x  In  121   frame applymap format  Out 121                b      d      e Utah    -0 03   1 08   1 28 Ohio     0 65   0 83  -1 55 Texas    0 51  -0 88   0 20 Oregon  -0 49  -0 48  -0 31      The reason for the name applymap is that Series has a map method for applying an element-wise function    In  122   frame  e   map format  Out 122    Utah       1 28 Ohio      -1 55 Texas      0 20 Oregon    -0 31 Name  e  dtype  object   Summing up  apply works on a row   column basis of a DataFrame  applymap works element-wise on a DataFrame  and map works element-wise on a Series

User · Answer

Comparing map  applymap and apply  Context Matters  First major difference  DEFINITION    map is defined on Series ONLY applymap is defined on DataFrames ONLY apply is defined on BOTH   Second major difference  INPUT ARGUMENT    map accepts dicts  Series  or callable applymap and apply accept callables only   Third major difference  BEHAVIOR   map is elementwise for Series applymap is elementwise for DataFrames apply also works elementwise but is suited to more complex operations and aggregation  The behaviour and return value depends on the function    Fourth major difference  the most important one   USE CASE   map is meant for mapping values from one domain to another  so is optimised for performance  e g   df  A   map  1  a   2  b   3  c     applymap is good for elementwise transformations across multiple rows columns  e g   df   A    B    C    applymap str strip   apply is for applying any function that cannot be vectorised  e g   df  sentences   apply nltk sent tokenize       Summarising       Footnotes            map when passed a dictionary Series will map elements based on the keys in that dictionary Series  Missing values will be recorded as   NaN in the output    applymap in more recent versions has been optimised for some operations  You will find applymap slightly faster than apply in   some cases  My suggestion is to test them both and use whatever works   better    map is optimised for elementwise mappings and transformation  Operations that involve dictionaries or Series will enable pandas to   use faster code paths for better performance     Series apply returns a scalar for aggregating operations  Series otherwise  Similarly for DataFrame apply  Note that apply also has   fastpaths when called with certain NumPy functions such as mean    sum  etc

User · Answer

Just wanted to point out  as I struggled with this for a bit  def f x       if x  lt  0          x   0     elif x  gt  100000          x   100000     return x  df applymap f  df describe     this does not modify the dataframe itself  has to be reassigned  df   df applymap f  df describe

[python] Difference between map, applymap and apply methods in Pandas

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to vectorization