Pandas conditional creation of a series dataframe column

Question

I have a dataframe along the lines of the below      Type       Set 1    A          Z 2    B          Z            3    B          X 4    C          Y  I want to add another column to the dataframe  or generate a series  of the same length as the dataframe  equal number of records rows  which sets a colour  green  if Set     Z  and  red  if Set equals anything else  What s the best way to do this

User · Answer

If you re working with massive data  a memoized approach would be best     First create a dictionary of manually stored values color dict     Z   red      Second  build a dictionary of  other  values color dict other    x  green  for x in df  Set   unique   if x not in color dict keys       Next  merge the two color dict update color dict other     Finally  map it to your column df  color     df  Set   map color dict    This approach will be fastest when you have many repeated values  My general rule of thumb is to memoize when  data size   10  4  amp  n distinct  lt  data size 4   E x  Memoize in a case 10 000 rows with 2 500 or fewer distinct values

User · Answer

You can simply use the powerful  loc method and use one condition or several depending on your need  tested with pandas 1 0 5   Code Summary  df pd DataFrame dict Type  A B B C  split    Set  Z Z X Y  split     df  Color      quot red quot  df loc  df  Set     quot Z quot     Color      quot green quot    practice  df loc  df  Set     quot Z quot   amp  df  Type     quot B quot    df  Type     quot C quot     Color      quot purple quot    Explanation  df pd DataFrame dict Type  A B B C  split    Set  Z Z X Y  split        df so far     Type Set   0    A   Z  1    B   Z  2    B   X  3    C   Y  add a  color  column and set all values to  quot red quot  df  Color      quot red quot   Apply your single condition  df loc  df  Set     quot Z quot     Color      quot green quot      df     Type Set  Color 0    A   Z  green 1    B   Z  green 2    B   X    red 3    C   Y    red  or multiple conditions if you want  df loc  df  Set     quot Z quot   amp  df  Type     quot B quot    df  Type     quot C quot     Color      quot purple quot   You can read on Pandas logical operators and conditional selection here  Logical operators for boolean indexing in Pandas

User · Answer

One liner with  apply   method is following   df  color     df  Set   apply lambda set    green  if set    Z  else  red     After that  df data frame looks like this    gt  gt  gt  print df    Type Set  color 0    A   Z  green 1    B   Z  green 2    B   X    red 3    C   Y    red

User · Answer

The following is slower than the approaches timed here  but we can compute the extra column based on the contents of more than one column  and more than two values can be computed for the extra column   Simple example using just the  Set  column   def set color row       if row  Set       Z           return  red      else          return  green   df   df assign color df apply set color  axis 1    print df      Set Type  color 0   Z    A    red 1   Z    B    red 2   X    B  green 3   Y    C  green   Example with more colours and more columns taken into account   def set color row       if row  Set       Z           return  red      elif row  Type       C           return  blue      else          return  green   df   df assign color df apply set color  axis 1    print df      Set Type  color 0   Z    A    red 1   Z    B    red 2   X    B  green 3   Y    C   blue   Edit  21 06 2019   Using plydata  It is also possible to use plydata to do this kind of things  this seems even slower than using assign and apply  though    from plydata import define  if else   Simple if else   df   define df  color if else  Set   Z      red      green      print df      Set Type  color 0   Z    A    red 1   Z    B    red 2   X    B  green 3   Y    C  green   Nested if else   df   define df  color if else       Set   Z          red        if else  Type   C      green      blue       print df                                  Set Type  color 0   Z    A    red 1   Z    B    red 2   X    B   blue 3   Y    C  green

User · Answer

If you only have two choices to select from   df  color     np where df  Set     Z    green    red     For example   import pandas as pd import numpy as np  df   pd DataFrame   Type  list  ABBC     Set  list  ZZXY     df  color     np where df  Set     Z    green    red   print df    yields    Set Type  color 0   Z    A  green 1   Z    B  green 2   X    B    red 3   Y    C    red     If you have more than two conditions then use np select  For example  if you want color to be    yellow when  df  Set       Z    amp   df  Type       A   otherwise blue when  df  Set       Z    amp   df  Type       B    otherwise purple when  df  Type       B   otherwise black    then use  df   pd DataFrame   Type  list  ABBC     Set  list  ZZXY     conditions          df  Set       Z    amp   df  Type       A         df  Set       Z    amp   df  Type       B         df  Type       B    choices     yellow    blue    purple   df  color     np select conditions  choices  default  black   print df    which yields    Set Type   color 0   Z    A  yellow 1   Z    B    blue 2   X    B  purple 3   Y    C   black

User · Answer

Here s yet another way to skin this cat  using a dictionary to map new values onto the keys in the list   def map values row  values dict       return values dict row   values dict     A   1   B   2   C   3   D   4   df   pd DataFrame   INDICATOR     A    B    C    D     VALUE    10  9  8  7     df  NEW VALUE     df  INDICATOR   apply map values  args    values dict      What s it look like   df Out 2      INDICATOR  VALUE  NEW VALUE 0         A     10          1 1         B      9          2 2         C      8          3 3         D      7          4   This approach can be very powerful when you have many ifelse-type statements to make  i e  many unique values to replace    And of course you could always do this   df  NEW VALUE     df  INDICATOR   map values dict    But that approach is more than three times as slow as the apply approach from above  on my machine   And you could also do this  using dict get   df  NEW VALUE      values dict get v  None  for v in df  INDICATOR

User · Answer

List comprehension is another way to create another column conditionally  If you are working with object dtypes in columns  like in your example  list comprehensions typically outperform most other methods   Example list comprehension   df  color       red  if x     Z  else  green  for x in df  Set       timeit tests   import pandas as pd import numpy as np  df   pd DataFrame   Type  list  ABBC     Set  list  ZZXY      timeit df  color       red  if x     Z  else  green  for x in df  Set     timeit df  color     np where df  Set     Z    green    red    timeit df  color     df Set map  lambda x   red  if x     Z  else  green    1000 loops  best of 3  239   s per loop 1000 loops  best of 3  523   s per loop 1000 loops  best of 3  263   s per loop

User · Answer

You can use pandas methods where and mask  df  color      green  df  color     df  color   where df  Set     Z   other  red     Replace values where the condition is False  or df  color      red  df  color     df  color   mask df  Set     Z   other  green     Replace values where the condition is True  Output    Type Set  color 1    A   Z  green 2    B   Z  green 3    B   X    red 4    C   Y    red

User · Answer

Another way in which this could be achieved is   df  color     df Set map  lambda x   red  if x     Z  else  green

[python] Pandas conditional creation of a series/dataframe column

Examples related to python

Examples related to pandas

Examples related to numpy

Examples related to dataframe