Remap values in pandas column with a dict

Question

I have a dictionary which looks like this  di    1   A   2   B    I would like to apply it to the  col1  column of a dataframe similar to        col1   col2 0       w      a 1       1      2 2       2    NaN   to get        col1   col2 0       w      a 1       A      2 2       B    NaN   How can I best do this  For some reason googling terms relating to this only shows me links about how to make columns from dicts and vice-versa  -

User · Answer

A more native pandas approach is to apply a replace function as below:

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

Once you defined the function, you can apply it to your dataframe.

di = {1: "A", 2: "B"}
df['col1'] = df.apply(lambda row: multiple_replace(di, row['col1']), axis=1)

User · Answer

DSM has the accepted answer  but the coding doesn t seem to work for everyone   Here is one that works with the current version of pandas  0 23 4 as of 8 2018    import pandas as pd  df   pd DataFrame   col1    1  2  2  3  1                col2     negative    positive    neutral    neutral    positive      conversion dict     negative   -1   neutral   0   positive   1  df  converted column     df  col2   replace conversion dict   print df head      You ll see it looks like      col1      col2  converted column 0     1  negative                -1 1     2  positive                 1 2     2   neutral                 0 3     3   neutral                 0 4     1  positive                 1   The docs for pandas DataFrame replace are here

User · Answer

As an extension to what have been proposed by Nico Coallier  apply to multiple columns  and U10-Forward using apply style of methods   and summarising it into a one-liner I propose   df loc     col1   col2    transform lambda x  x map lambda x   1   A   2   B   get x x     The  transform   processes each column as a series  Contrary to  apply  which passes the columns aggregated in a DataFrame    Consequently you can apply the Series method map        Finally  and I discovered this behaviour thanks to U10  you can use the whole Series in the  get   expression  Unless I have misunderstood its behaviour and it processes sequentially the series instead of bitwisely  The  get x x accounts for the values you did not mention in your mapping dictionary which would be considered as Nan otherwise by the  map   method

User · Answer

There is a bit of ambiguity in your question  There are at least three two interpretations    the keys in di refer to index values the keys in di refer to df  col1   values the keys in di refer to index locations  not the OP s question  but thrown in for fun     Below is a solution for each case     Case 1  If the keys of di are meant to refer to index values  then you could use the update method   df  col1   update pd Series di     For example   import pandas as pd import numpy as np  df   pd DataFrame   col1    w   10  20                       col2     a   30  np nan                      index  1 2 0       col1 col2   1    w    a   2   10   30   0   20  NaN  di    0   A   2   B      The value at the 0-index is mapped to  A   the value at the 2-index is mapped to  B  df  col1   update pd Series di   print df    yields    col1 col2 1    w    a 2    B   30 0    A  NaN   I ve modified the values from your original post so it is clearer what update is doing  Note how the keys in di are associated with index values  The order of the index values -- that is  the index locations -- does not matter     Case 2  If the keys in di refer to df  col1   values  then  DanAllan and  DSM show how to achieve this with replace   import pandas as pd import numpy as np  df   pd DataFrame   col1    w   10  20                       col2     a   30  np nan                      index  1 2 0   print df      col1 col2   1    w    a   2   10   30   0   20  NaN  di    10   A   20   B      The values 10 and 20 are replaced by  A  and  B  df  col1   replace di  inplace True  print df    yields    col1 col2 1    w    a 2    A   30 0    B  NaN   Note how in this case the keys in di were changed to match values in df  col1       Case 3  If the keys in di refer to index locations  then you could use  df  col1   put di keys    di values      since  df   pd DataFrame   col1    w   10  20                       col2     a   30  np nan                      index  1 2 0   di    0   A   2   B      The values at the 0 and 2 index locations are replaced by  A  and  B  df  col1   put di keys    di values    print df    yields    col1 col2 1    A    a 2   10   30 0    B  NaN   Here  the first and third rows were altered  because the keys in di are 0 and 2  which with Python s 0-based indexing refer to the first and third locations

User · Answer

Adding to this question if you ever have more than one columns to remap in a data dataframe   def remap data dict labels               This function take in a dictionnary of labels   dict labels      and replace the values  previously labelencode  into the string       ex  dict labels      col1   1  A  2  B                 for field values in dict labels items            print  I am remapping  s  field          data replace  field values  inplace True      print  DONE        return data   Hope it can be useful to someone   Cheers

User · Answer

Or do apply   df  col1   apply lambda x   1   A   2   B   get x x     Demo    gt  gt  gt  df  col1   df  col1   apply lambda x   1   A   2   B   get x x    gt  gt  gt  df   col1 col2 0    w    a 1    1    2 2    2  NaN  gt  gt  gt

User · Answer

You can use  replace   For example    gt  gt  gt  df   pd DataFrame   col2    0   a   1  2  2  np nan    col1    0   w   1  1  2  2     gt  gt  gt  di    1   A   2   B    gt  gt  gt  df   col1 col2 0    w    a 1    1    2 2    2  NaN  gt  gt  gt  df replace   col1   di     col1 col2 0    w    a 1    A    2 2    B  NaN   or directly on the Series  i e  df  col1   replace di  inplace True

User · Answer

Given map is faster than replace   JohnE s solution  you need to be careful with Non-Exhaustive mappings where you intend to map specific values to NaN  The proper method in this case requires that you mask the Series when you  fillna  else you undo the mapping to NaN   import pandas as pd import numpy as np  d     m    Male    f    Female    missing   np NaN  df   pd DataFrame   gender     m    f    missing    Male    U         keep nan    k for k v in d items   if pd isnull v   s   df  gender    df  mapped     s map d  fillna s mask s isin keep nan            gender  mapped 0        m    Male 1        f  Female 2  missing     NaN 3     Male    Male 4        U       U

User · Answer

map can be much faster than replace  If your dictionary has more than a couple of keys  using map can be much faster than replace   There are two versions of this approach  depending on whether your dictionary exhaustively maps all possible values  and also whether you want non-matches to keep their values or be converted to NaNs    Exhaustive Mapping  In this case  the form is very simple   df  col1   map di          note  if the dictionary does not exhaustively map all                            entries then non-matched entries are changed to NaNs   Although map most commonly takes a function as its argument  it can alternatively take a dictionary or series   Documentation for Pandas series map  Non-Exhaustive Mapping  If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches  you can add fillna   df  col1   map di  fillna df  col1      as in  jpp s answer here   Replace values in a pandas series via dictionary efficiently  Benchmarks  Using the following data with pandas version 0 23 1   di    1   A   2   B   3   C   4   D   5   E   6   F   7   G   8   H    df   pd DataFrame    col1   np random choice  range 1 9   100000        and testing with  timeit  it appears that map is approximately 10x faster than replace     Note that your speedup with map will vary with your data   The largest speedup appears to be with large dictionaries and exhaustive replaces   See  jpp answer  linked above  for more extensive benchmarks and discussion

User · Answer

A nice complete solution that keeps a map of your class labels    labels   features  col1   unique   labels dict   dict zip labels  range len labels     features   features replace   col1   labels dict     This way  you can at any point refer to the original class label from labels dict

[python] Remap values in pandas column with a dict

Examples related to python

Examples related to dictionary

Examples related to pandas

Examples related to remap