[python] Strip / trim all strings of a dataframe

Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :

import pandas as pd

df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])

df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end

df.values

This is quite slow, what could I improve ?

This question is related to python regex pandas dataframe trim

The answer is


You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

Notice: Values cannot be types like dicts or lists, because their dtypes is object.

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

df[0] = df[0].str.strip()

You can try:

df[0] = df[0].str.strip()

or more specifically for all string columns

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

def trim(x):
    if x.dtype == object:
        x = x.str.split(' ').str[0]
    return(x)

df = df.apply(trim)

Money Shot

Here's a compact version of using applymap with a straightforward lambda expression to call strip only when the value is of a string type:

df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

Full Example

A more complete example:

import pandas as pd


def trim_all_columns(df):
    """
    Trim whitespace from ends of each value across all series in dataframe
    """
    trim_strings = lambda x: x.strip() if isinstance(x, str) else x
    return df.applymap(trim_strings)


# simple example of trimming whitespace from data elements
df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
df = trim_all_columns(df)
print(df)


>>>
   0   1
0  a  10
1  c   5

Working Example

Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab


You can use the apply function of the Series object:

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df[0][0]
'  a  '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'

Note the usage of strip and not the regex which is much faster

Another option - use the apply function of the DataFrame object:

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)

   0   1
0  a  10
1  c   5

If you really want to use regex, then

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
   0   1
0  a  10
1  c   5

But it should be faster to do it like this:

>>> df[0] = df[0].str.strip()

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to trim

How to remove whitespace from a string in typescript? Strip / trim all strings of a dataframe Best way to verify string is empty or null Does swift have a trim method on String? Trim specific character from a string Trim whitespace from a String Remove last specific character in a string c# How to delete specific characters from a string in Ruby? How to Remove the last char of String in C#? How to use a TRIM function in SQL Server