[python] Pandas dataframe fillna() only some columns in place

I am trying to fill none values in a Pandas dataframe with 0's for only some subset of columns.

When I do:

import pandas as pd
df = pd.DataFrame(data={'a':[1,2,3,None],'b':[4,5,None,6],'c':[None,None,7,8]})
print df
df.fillna(value=0, inplace=True)
print df

The output:

     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  NaN  7.0
3  NaN  6.0  8.0
     a    b    c
0  1.0  4.0  0.0
1  2.0  5.0  0.0
2  3.0  0.0  7.0
3  0.0  6.0  8.0

It replaces every None with 0's. What I want to do is, only replace Nones in columns a and b, but not c.

What is the best way of doing this?

This question is related to python pandas dataframe

The answer is


using the top answer produces a warning about making changes to a copy of a df slice. Assuming that you have other columns, a better way to do this is to pass a dictionary:
df.fillna({'A': 'NA', 'B': 'NA'}, inplace=True)


Here's how you can do it all in one line:

df[['a', 'b']].fillna(value=0, inplace=True)

Breakdown: df[['a', 'b']] selects the columns you want to fill NaN values for, value=0 tells it to fill NaNs with zero, and inplace=True will make the changes permanent, without having to make a copy of the object.


Sometimes this syntax wont work:

df[['col1','col2']] = df[['col1','col2']].fillna()

Use the following instead:

df['col1','col2']

You can avoid making a copy of the object using Wen's solution and inplace=True:

df.fillna({'a':0, 'b':0}, inplace=True)
print(df)

Which yields:

     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

For some odd reason this DID NOT work (using Pandas: '0.25.1')

df[['col1', 'col2']].fillna(value=0, inplace=True)

Another solution:

subset_cols = ['col1','col2']
[df[col].fillna(0, inplace=True) for col in subset_cols]

Example:

df = pd.DataFrame(data={'col1':[1,2,np.nan,], 'col2':[1,np.nan,3], 'col3':[np.nan,2,3]})

output:

   col1  col2  col3
0  1.00  1.00   nan
1  2.00   nan  2.00
2   nan  3.00  3.00

Apply list comp. to fillna values:

subset_cols = ['col1','col2']
[df[col].fillna(0, inplace=True) for col in subset_cols]

Output:

   col1  col2  col3
0  1.00  1.00   nan
1  2.00  0.00  2.00
2  0.00  3.00  3.00

You can using dict , fillna with different value for different column

df.fillna({'a':0,'b':0})
Out[829]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

After assign it back

df=df.fillna({'a':0,'b':0})
df
Out[831]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

Or something like:

df.loc[df['a'].isnull(),'a']=0
df.loc[df['b'].isnull(),'b']=0

and if there is more:

for i in your_list:
    df.loc[df[i].isnull(),i]=0

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe