[python] Pandas/Python: Set value of one column based on value in another column

I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:

if df['c1'] == 'Value':
    df['c2'] = 10
else:
    df['c2'] = df['c3']

I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).

If I try to run the code above or if I write it as a function and use the apply method, I get the following:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This question is related to python pandas conditional

The answer is


try:

df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)


Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if/else).

df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']

I suggest doing it in two steps:

# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10

# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']

I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:

file['Flag'] = (file['Claim_Amount'] > 0)

This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.


You can use np.where() to set values based on a specified condition:

#df
   c1  c2  c3
0   4   2   1
1   8   7   9
2   1   5   8
3   3   3   5
4   3   6   8

Now change values (or set) in column ['c2'] based on your condition.

df['c2'] = np.where(df.c1 == 8,'X', df.c3)

   c1  c3  c4
0   4   1   1
1   8   9   X
2   1   8   8
3   3   5   5
4   3   8   8

You can use pandas.DataFrame.mask to add virtually as many conditions as you need:

data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}

d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}

d['new'] = np.nan
for value in c.values():
    d['new'].mask(d['a'] == value[0], value[1], inplace=True)

d['new'] = d['new'].fillna('Else')
d

Output:

    a   b   new
0   1   6   Else
1   2   8   Value1
2   3   9   Value2
3   4   10  Else
4   5   11  11

Try out df.apply() if you've a small/medium dataframe,

df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)

Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to conditional

Pandas/Python: Set value of one column based on value in another column Run an Ansible task only when the variable contains a specific string (Excel) Conditional Formatting based on Adjacent Cell Value Laravel Checking If a Record Exists Multiple conditions in if statement shell script The condition has length > 1 and only the first element will be used Creating a new column based on if-elif-else condition How to conditional format based on multiple specific text in Excel Using SUMIFS with multiple AND OR conditions Replacing Numpy elements if condition is met