[python] Understanding inplace=True

The inplace parameter:

df.dropna(axis='index', how='all', inplace=True)

in Pandas and in general means:

1. Pandas creates a copy of the original data

2. ... does some computation on it

3. ... assigns the results to the original data.

4. ... deletes the copy.

As you can read in the rest of my answer's further below, we still can have good reason to use this parameter i.e. the inplace operations, but we should avoid it if we can, as it generate more issues, as:

1. Your code will be harder to debug (Actually SettingwithCopyWarning stands for warning you to this possible problem)

2. Conflict with method chaining


So there is even case when we should use it yet?

Definitely yes. If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:

(
    wine.rename(columns={"color_intensity": "ci"})
    .assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
    .query("alcohol > 14 and color_filter == 1")
    .sort_values("alcohol", ascending=False)
    .reset_index(drop=True)
    .loc[:, ["alcohol", "ci", "hue"]]
)

which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copy of the input data. We can see clearly, that we will have 2 x original data memory consumption after this operations.

Or we can use inplace parameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.


Final conclusion:

Avoid using inplace parameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to in-place

Understanding inplace=True What is the difference between `sorted(list)` vs `list.sort()`? How can I convert tabs to spaces in every file of a directory? Python Math - TypeError: 'NoneType' object is not subscriptable In-place edits with sed on OS X How to delete from a text file, all lines that contain a specific string? How to sort in-place using the merge sort algorithm? Updating a java map entry