Change one value based on another value in pandas

Question

I m trying to reprogram my Stata code into Python for speed improvements  and I was pointed in the direction of PANDAS   I am  however  having a hard time wrapping my head around how to process the data   Let s say I want to iterate over all values in the column head  ID   If that ID matches a specific number  then I want to change two corresponding values FirstName and LastName   In Stata it looks like this   replace FirstName    Matt  if ID  103 replace LastName     Jones  if ID  103   So this replaces all values in FirstName that correspond with values of ID    103 to Matt     In PANDAS  I m trying something like this  df   read csv  test csv   for i in df  ID        if i   103                  Not sure where to go from here   Any ideas

User · Answer

You can use map  it can map vales from a dictonairy or even a custom function   Suppose this is your df       ID First Name Last Name 0  103          a         b 1  104          c         d   Create the dicts   fnames    103   Matt   104   Mr   lnames    103   Jones   104   X     And map   df  First Name     df  ID   map fnames  df  Last Name     df  ID   map lnames    The result will be       ID First Name Last Name 0  103       Matt     Jones 1  104         Mr         X   Or use a custom function   names    103    Matt    Jones    104    Mr    X    df  First Name     df  ID   map lambda x  names x  0

User · Answer

The original question addresses a specific narrow use case  For those who need more generic answers here are some examples   Creating a new column using data from other columns  Given the dataframe below    import pandas as pd import numpy as np  df   pd DataFrame    dog    hound   5                        cat    ragdoll   1                      columns   animal    type    age     In 1   Out 1     animal     type  age ---------------------- 0    dog    hound    5 1    cat  ragdoll    1   Below we are adding a new description column as a concatenation of other columns by using the   operation which is overridden for series  Fancy string formatting  f-strings etc won t work here since the   applies to scalars and not  primitive  values   df  description      A     df age astype str      years old                           df type         df animal  In  2   df Out 2     animal     type  age                description ------------------------------------------------- 0    dog    hound    5    A 5 years old hound dog 1    cat  ragdoll    1  A 1 years old ragdoll cat   We get 1 years for the cat  instead of 1 year  which we will be fixing below using conditionals   Modifying an existing column with conditionals  Here we are replacing the original animal column with values from other columns  and using np where to set a conditional substring based on the value of age     append  s  to  age  if it s greater than 1 df animal   df animal          df type                df age astype str      year    np where df age  gt  1   s        In  3   df Out 3                    animal     type  age ------------------------------------- 0   dog  hound  5 years    hound    5 1  cat  ragdoll  1 year  ragdoll    1   Modifying multiple columns with conditionals  A more flexible approach is to call  apply   on an entire dataframe rather than on a single column   def transform row r       r animal    wild     r type     r type   r animal     creature      r age       year    format r age  r age  gt  1 and  s  or         return r  df apply transform row  axis 1   In 4   Out 4            animal            type      age ---------------------------------------- 0    wild hound    dog creature  5 years 1  wild ragdoll    cat creature   1 year   In the code above the transform row r  function takes a Series object representing a given row  indicated by axis 1  the default value of axis 0 will provide a Series object for each column   This simplifies processing since we can access the actual  primitive  values in the row using the column names and have visibility of other cells in the given row column

User · Answer

df  FirstName   df  ID   apply lambda x   Matt  if x  103 else     df  LastName   df  ID   apply lambda x   Jones  if x  103 else

User · Answer

One option is to use Python s slicing and indexing features to logically evaluate the places where your condition holds and overwrite the data there   Assuming you can load your data directly into pandas with pandas read csv then the following code might be helpful for you   import pandas df   pandas read csv  test csv   df loc df ID    103   FirstName      Matt  df loc df ID    103   LastName      Jones    As mentioned in the comments  you can also do the assignment to both columns in one shot   df loc df ID    103    FirstName    LastName       Matt    Jones    Note that you ll need pandas version 0 11 or newer to make use of loc for overwrite assignment operations     Another way to do it is to use what is called chained assignment  The behavior of this is less stable and so it is not considered the best solution  it is explicitly discouraged in the docs   but it is useful to know about   import pandas df   pandas read csv  test csv   df  FirstName   df ID    103     Matt  df  LastName   df ID    103     Jones

User · Answer

I found it much easier to debut by printing out where each row meets the condition  for n in df columns      if np where df n     103            print n          print df df n     103  index

User · Answer

This question might still be visited often enough that it s worth offering an addendum to Mr Kassies  answer  The dict built-in class can be sub-classed so that a default is returned for  missing  keys  This mechanism works well for pandas  But see below   In this way it s possible to avoid key errors    gt  gt  gt  import pandas as pd  gt  gt  gt  data      ID     101  201  301  401      gt  gt  gt  df   pd DataFrame data   gt  gt  gt  class SurnameMap dict           def   missing   self  key               return              gt  gt  gt  surnamemap   SurnameMap    gt  gt  gt  surnamemap 101     Mohanty   gt  gt  gt  surnamemap 301     Drake   gt  gt  gt  df  Surname     df  ID   apply lambda x  surnamemap x    gt  gt  gt  df     ID  Surname 0  101  Mohanty 1  201          2  301    Drake 3  401            The same thing can be done more simply in the following way  The use of the  default  argument for the get method of a dict object makes it unnecessary to subclass a dict    gt  gt  gt  import pandas as pd  gt  gt  gt  data      ID     101  201  301  401      gt  gt  gt  df   pd DataFrame data   gt  gt  gt  surnamemap       gt  gt  gt  surnamemap 101     Mohanty   gt  gt  gt  surnamemap 301     Drake   gt  gt  gt  df  Surname     df  ID   apply lambda x  surnamemap get x        gt  gt  gt  df     ID  Surname 0  101  Mohanty 1  201          2  301    Drake 3  401

[python] Change one value based on another value in pandas

Examples related to python

Examples related to pandas