[python] Find column whose name contains a specific string

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I've tried to find ways to do this, to no avail. Any tips?

This question is related to python python-3.x string pandas dataframe

The answer is


Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

Output:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

Explanation:

  1. df.columns returns a list of column names
  2. [col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

df2 = df.filter(regex='spike')
print(df2)

Output:

   spike-2  spiked-in
0        1          7
1        2          8
2        3          9

Getting name and subsetting based on Start, Contains, and Ends:

# from: https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html




import pandas as pd



data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)



print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist() 
print("Contains")
print(colNames_contains)



print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist() 
print("Starts")
print(colNames_starts)



print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike$')].tolist() 
print("Ends")
print(colNames_ends)



print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)



print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)



print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike$',axis=1)
print("Ends")
print(df_subset_ends)

This answer uses the DataFrame.filter method to do this without list comprehension:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)

Will output just 'spike-2'. You can also use regex, as some people suggested in comments above:

print(df.filter(regex='spike|spke').columns)

Will output both columns: ['spike-2', 'hey spke']


You can also use df.columns[df.columns.str.contains(pat = 'spike')]

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')] 

print(colNames)

This will output the column names: 'spike-2', 'spiked-in'

More about pandas.Series.str.contains.


You also can use this code:

spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]

# select columns containing 'spike'
df.filter(like='spike', axis=1)

You can also select by name, regular expression. Refer to: pandas.DataFrame.filter


df.loc[:,df.columns.str.contains("spike")]

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-3.x

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Replace specific text with a redacted version using Python Upgrade to python 3.8 using conda "Permission Denied" trying to run Python on Windows 10 Python: 'ModuleNotFoundError' when trying to import module from imported package What is the meaning of "Failed building wheel for X" in pip install? How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How to upgrade Python version to 3.7?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe