Modifying a subset of rows in a pandas dataframe

Question

Assume I have a pandas DataFrame with two columns  A and B  I d like to modify this DataFrame  or create a copy  so that B is always NaN whenever A is 0  How would I achieve that   I tried the following  df  A   0   B     np nan   and  df  A   0   B   values fill np nan    without success

User · Accepted Answer

Use  loc for label based indexing   df loc df A  0   B     np nan   The df A  0 expression creates a boolean series that indexes the rows   B  selects the column  You can also use this to transform a subset of a column  e g    df loc df A  0   B     df loc df A  0   B     2   I don t know enough about pandas internals to know exactly why that works  but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result  and sometimes it returns a view on the original object  According to documentation here  this behavior depends on the underlying numpy behavior  I ve found that accessing everything in one operation  rather than  one  two   is more likely to work for setting

User · Answer

For a massive speed increase  use NumPy s where function  Setup Create a two-column DataFrame with 100 000 rows with some zeros  df   pd DataFrame np random randint 0 3   100000 2    columns list  ab     Fast solution with numpy where df  b     np where df a values    0  np nan  df b values   Timings  timeit df  b     np where df a values    0  np nan  df b values  685   s    6 4   s per loop  mean    std  dev  of 7 runs  1000 loops each    timeit df loc df  a      0   b     np nan 3 11 ms    17 2   s per loop  mean    std  dev  of 7 runs  100 loops each   Numpy s where is about 4x faster

User · Answer

To replace multiples columns convert to numpy array using  values   df loc df A  0    B    C      df loc df A  0    B    C    values   2

User · Answer

Here is from pandas docs on advanced indexing    The section will explain exactly what you need  Turns out df loc  as  ix has been deprecated -- as many have pointed out below  can be used for cool slicing dicing of a dataframe  And  It can also be used to set things    df loc selection criteria  columns I want    value   So Bren s answer is saying  find me all the places where df A    0  select column B and set it to np nan

User · Answer

Starting from pandas 0 20 ix is deprecated  The right way is to use df loc  here is a working example    gt  gt  gt  import pandas as pd   gt  gt  gt  import numpy as np   gt  gt  gt  df   pd DataFrame   A   0 1 0    B   2 0 5    columns list  AB     gt  gt  gt  df loc df A    0   B     np nan  gt  gt  gt  df    A   B 0  0 NaN 1  1   0 2  0 NaN  gt  gt  gt       Explanation   As explained in the doc here   loc is primarily label based  but may also be used with a boolean array    So  what we are doing above is applying df loc row index  column index  by     Exploiting the fact that loc can take a boolean array as a mask that tells pandas which subset of rows we want to change in row index  Exploiting the fact loc is also label based to select the column using the label  B  in the column index    We can use logical  condition or  any operation that returns a series of booleans to construct the array of booleans   In the above example  we want any rows that contain a 0  for that we can use df A    0  as you can see in the example below  this returns a series of booleans       gt  gt  gt  df   pd DataFrame   A   0 1 0    B   2 0 5    columns list  AB     gt  gt  gt  df     A  B 0  0  2 1  1  0 2  0  5  gt  gt  gt  df A    0  0     True 1    False 2     True Name  A  dtype  bool  gt  gt  gt     Then  we use the above array of booleans to select and modify the necessary rows     gt  gt  gt  df loc df A    0   B     np nan  gt  gt  gt  df    A   B 0  0 NaN 1  1   0 2  0 NaN   For more information check the advanced indexing documentation here

[python] Modifying a subset of rows in a pandas dataframe

Examples related to python

Examples related to pandas