try this,
df.loc[df['eri_white']==1,'race_label'] = 'White'
df.loc[df['eri_hawaiian']==1,'race_label'] = 'Haw/Pac Isl.'
df.loc[df['eri_afr_amer']==1,'race_label'] = 'Black/AA'
df.loc[df['eri_asian']==1,'race_label'] = 'Asian'
df.loc[df['eri_nat_amer']==1,'race_label'] = 'A/I AK Native'
df.loc[(df['eri_afr_amer'] + df['eri_asian'] + df['eri_hawaiian'] + df['eri_nat_amer'] + df['eri_white']) > 1,'race_label'] = 'Two Or More'
df.loc[df['eri_hispanic']==1,'race_label'] = 'Hispanic'
df['race_label'].fillna('Other', inplace=True)
O/P:
lname fname rno_cd eri_afr_amer eri_asian eri_hawaiian \
0 MOST JEFF E 0 0 0
1 CRUISE TOM E 0 0 0
2 DEPP JOHNNY NaN 0 0 0
3 DICAP LEO NaN 0 0 0
4 BRANDO MARLON E 0 0 0
5 HANKS TOM NaN 0 0 0
6 DENIRO ROBERT E 0 1 0
7 PACINO AL E 0 0 0
8 WILLIAMS ROBIN E 0 0 1
9 EASTWOOD CLINT E 0 0 0
eri_hispanic eri_nat_amer eri_white rno_defined race_label
0 0 0 1 White White
1 1 0 0 White Hispanic
2 0 0 1 Unknown White
3 0 0 1 Unknown White
4 0 0 0 White Other
5 0 0 1 Unknown White
6 0 0 1 White Two Or More
7 0 0 1 White White
8 0 0 0 White Haw/Pac Isl.
9 0 0 1 White White
use .loc
instead of apply
.
it improves vectorization.
.loc
works in simple manner, mask rows based on the condition, apply values to the freeze rows.
for more details visit, .loc docs
Performance metrics:
Accepted Answer:
def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'
df=pd.read_csv('dataser.csv')
df = pd.concat([df]*1000)
%timeit df.apply(lambda row: label_race(row), axis=1)
1.15 s ± 46.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
My Proposed Answer:
def label_race(df):
df.loc[df['eri_white']==1,'race_label'] = 'White'
df.loc[df['eri_hawaiian']==1,'race_label'] = 'Haw/Pac Isl.'
df.loc[df['eri_afr_amer']==1,'race_label'] = 'Black/AA'
df.loc[df['eri_asian']==1,'race_label'] = 'Asian'
df.loc[df['eri_nat_amer']==1,'race_label'] = 'A/I AK Native'
df.loc[(df['eri_afr_amer'] + df['eri_asian'] + df['eri_hawaiian'] + df['eri_nat_amer'] + df['eri_white']) > 1,'race_label'] = 'Two Or More'
df.loc[df['eri_hispanic']==1,'race_label'] = 'Hispanic'
df['race_label'].fillna('Other', inplace=True)
df=pd.read_csv('s22.csv')
df = pd.concat([df]*1000)
%timeit label_race(df)
24.7 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)