pandas create new column based on values from other columns apply a function of multiple columns row-wise

Question

I want to apply my custom function  it uses an if-else ladder  to these six columns  ERI Hispanic  ERI AmerInd AKNatv  ERI Asian  ERI Black Afr Amer  ERI HI PacIsl  ERI White  in each row of my dataframe   I ve tried different methods from other questions but still can t seem to find the right answer for my problem   The critical piece of this is that if the person is counted as Hispanic they can t be counted as anything else   Even if they have a  1  in another ethnicity column they still are counted as Hispanic not two or more races   Similarly  if the sum of all the ERI columns is greater than 1 they are counted as two or more races and can t be counted as a unique ethnicity except for Hispanic    Hopefully this makes sense   Any help will be greatly appreciated    Its almost like doing a for loop through each row and if each record meets a criterion they are added to one list and eliminated from the original     From the dataframe below I need to calculate a new column based on the following spec in SQL                              CRITERIA                                   IF  ERI Hispanic    1 THEN RETURN    Hispanic    ELSE IF SUM  ERI AmerInd AKNatv     ERI Asian     ERI Black Afr Amer     ERI HI PacIsl     ERI White    gt  1 THEN RETURN    Two or More    ELSE IF  ERI AmerInd AKNatv    1 THEN RETURN    A I AK Native    ELSE IF  ERI Asian    1 THEN RETURN    Asian    ELSE IF  ERI Black Afr Amer    1 THEN RETURN    Black AA    ELSE IF  ERI HI PacIsl    1 THEN RETURN    Haw Pac Isl     ELSE IF  ERI White    1 THEN RETURN    White      Comment  If the ERI Flag for Hispanic is True  1   the employee is classified as    Hispanic     Comment  If more than 1 non-Hispanic ERI Flag is true  return    Two or More                             DATAFRAME                                   lname          fname       rno cd  eri afr amer    eri asian   eri hawaiian    eri hispanic    eri nat amer    eri white   rno defined 0    MOST           JEFF        E       0               0           0               0               0               1           White 1    CRUISE         TOM         E       0               0           0               1               0               0           White 2    DEPP           JOHNNY              0               0           0               0               0               1           Unknown 3    DICAP          LEO                 0               0           0               0               0               1           Unknown 4    BRANDO         MARLON      E       0               0           0               0               0               0           White 5    HANKS          TOM         0                       0           0               0               0               1           Unknown 6    DENIRO         ROBERT      E       0               1           0               0               0               1           White 7    PACINO         AL          E       0               0           0               0               0               1           White 8    WILLIAMS       ROBIN       E       0               0           1               0               0               0           White 9    EASTWOOD       CLINT       E       0               0           0               0               0               1           White

User · Answer

As  user3483203 pointed out  numpy select is the best approach Store your conditional statements and the corresponding actions in two lists conds     df  eri hispanic      1   df   eri afr amer    eri asian    eri hawaiian    eri nat amer    eri white    sum 1  gt 1    df  eri nat amer      1   df  eri asian      1   df  eri afr amer      1   df  eri hawaiian      1   df  eri white      1     actions     Hispanic    Two Or More    A I AK Native    Asian    Black AA    Haw Pac Isl     White    You can now use np select using these lists as its arguments df  label race     np select conds actions default  Other    Reference  https   numpy org doc stable reference generated numpy select html

User · Answer

Since this is the first Google result for  pandas new column from others   here s a simple example   import pandas as pd    make a simple dataframe df   pd DataFrame   a   1 2    b   3 4    df      a  b   0  1  3   1  2  4    create an unattached column with an index df apply lambda row  row a   row b  axis 1    0    4   1    6    do same but attach it to the dataframe df  c     df apply lambda row  row a   row b  axis 1  df      a  b  c   0  1  3  4   1  2  4  6     If you get the SettingWithCopyWarning you can do it this way also   fn   lambda row  row a   row b   define a function for the new column col   df apply fn  axis 1    get column data with an index df   df assign c col values    assign values to column  c    Source  https   stackoverflow com a 12555510 243392  And if your column name includes spaces you can use syntax like this   df   df assign     some column name   col values       And here s the documentation for apply  and assign

User · Answer

apply   takes in a function as the first parameter  pass in the label race function as so   df  race label     df apply label race  axis 1    You don t need to make a lambda function to pass in a function

User · Answer

OK  two steps to this - first is to write a function that does the translation you want - I ve put an example together based on your pseudo-code   def label race  row      if row  eri hispanic      1         return  Hispanic     if row  eri afr amer     row  eri asian     row  eri hawaiian     row  eri nat amer     row  eri white    gt  1         return  Two Or More     if row  eri nat amer      1         return  A I AK Native     if row  eri asian      1        return  Asian     if row  eri afr amer       1        return  Black AA     if row  eri hawaiian      1        return  Haw Pac Isl      if row  eri white      1        return  White     return  Other    You may want to go over this  but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled  row    Next  use the apply function in pandas to apply the function - e g   df apply  lambda row  label race row   axis 1    Note the axis 1 specifier  that means that the application is done at a row  rather than a column level  The results are here   0           White 1        Hispanic 2           White 3           White 4           Other 5           White 6     Two Or More 7           White 8    Haw Pac Isl  9           White   If you re happy with those results  then run it again  saving the results into a new column in your original dataframe   df  race label     df apply  lambda row  label race row   axis 1    The resultant dataframe looks like this  scroll to the right to see the new column          lname   fname rno cd  eri afr amer  eri asian  eri hawaiian   eri hispanic  eri nat amer  eri white rno defined    race label 0      MOST    JEFF      E             0          0             0              0             0          1       White         White 1    CRUISE     TOM      E             0          0             0              1             0          0       White      Hispanic 2      DEPP  JOHNNY    NaN             0          0             0              0             0          1     Unknown         White 3     DICAP     LEO    NaN             0          0             0              0             0          1     Unknown         White 4    BRANDO  MARLON      E             0          0             0              0             0          0       White         Other 5     HANKS     TOM    NaN             0          0             0              0             0          1     Unknown         White 6    DENIRO  ROBERT      E             0          1             0              0             0          1       White   Two Or More 7    PACINO      AL      E             0          0             0              0             0          1       White         White 8  WILLIAMS   ROBIN      E             0          0             1              0             0          0       White  Haw Pac Isl  9  EASTWOOD   CLINT      E             0          0             0              0             0          1       White         White

User · Answer

The answers above are perfectly valid  but a vectorized solution exists  in the form of numpy select   This allows you to define conditions  then define outputs for those conditions  much more efficiently than using apply     First  define conditions   conditions         df  eri hispanic      1      df   eri afr amer    eri asian    eri hawaiian    eri nat amer    eri white    sum 1  gt 1       df  eri nat amer      1      df  eri asian      1      df  eri afr amer      1      df  eri hawaiian      1      df  eri white      1      Now  define the corresponding outputs   outputs          Hispanic    Two Or More    A I AK Native    Asian    Black AA    Haw Pac Isl     White      Finally  using numpy select   res   np select conditions  outputs   Other   pd Series res      0           White 1        Hispanic 2           White 3           White 4           Other 5           White 6     Two Or More 7           White 8    Haw Pac Isl  9           White dtype  object     Why should numpy select be used over apply  Here are some performance checks   df   pd concat  df  1000   In  42    timeit df apply lambda row  label race row   axis 1  1 07 s    4 16 ms per loop  mean    std  dev  of 7 runs  1 loop each   In  44     timeit          conditions                  df  eri hispanic      1               df   eri afr amer    eri asian    eri hawaiian    eri nat amer    eri white    sum 1  gt 1                df  eri nat amer      1               df  eri asian      1               df  eri afr amer      1               df  eri hawaiian      1               df  eri white      1                               outputs                   Hispanic    Two Or More    A I AK Native    Asian    Black AA    Haw Pac Isl     White                               np select conditions  outputs   Other                     3 09 ms    17   s per loop  mean    std  dev  of 7 runs  100 loops each    Using numpy select gives us vastly improved performance  and the discrepancy will only increase as the data grows

User · Answer

try this   df loc df  eri white    1  race label      White  df loc df  eri hawaiian    1  race label      Haw Pac Isl   df loc df  eri afr amer    1  race label      Black AA  df loc df  eri asian    1  race label      Asian  df loc df  eri nat amer    1  race label      A I AK Native  df loc  df  eri afr amer     df  eri asian     df  eri hawaiian     df  eri nat amer     df  eri white     gt  1  race label      Two Or More  df loc df  eri hispanic    1  race label      Hispanic  df  race label   fillna  Other   inplace True    O P        lname   fname rno cd  eri afr amer  eri asian  eri hawaiian    0      MOST    JEFF      E             0          0             0    1    CRUISE     TOM      E             0          0             0    2      DEPP  JOHNNY    NaN             0          0             0    3     DICAP     LEO    NaN             0          0             0    4    BRANDO  MARLON      E             0          0             0    5     HANKS     TOM    NaN             0          0             0    6    DENIRO  ROBERT      E             0          1             0    7    PACINO      AL      E             0          0             0    8  WILLIAMS   ROBIN      E             0          0             1    9  EASTWOOD   CLINT      E             0          0             0        eri hispanic  eri nat amer  eri white rno defined    race label   0             0             0          1       White         White   1             1             0          0       White      Hispanic   2             0             0          1     Unknown         White   3             0             0          1     Unknown         White   4             0             0          0       White         Other   5             0             0          1     Unknown         White   6             0             0          1       White   Two Or More   7             0             0          1       White         White   8             0             0          0       White  Haw Pac Isl    9             0             0          1       White         White    use  loc instead of apply    it improves vectorization     loc works in simple manner  mask rows based on the condition  apply values to the freeze rows    for more details visit    loc docs   Performance metrics   Accepted Answer   def label race  row      if row  eri hispanic      1         return  Hispanic     if row  eri afr amer     row  eri asian     row  eri hawaiian     row  eri nat amer     row  eri white    gt  1         return  Two Or More     if row  eri nat amer      1         return  A I AK Native     if row  eri asian      1        return  Asian     if row  eri afr amer       1        return  Black AA     if row  eri hawaiian      1        return  Haw Pac Isl      if row  eri white      1        return  White     return  Other   df pd read csv  dataser csv   df   pd concat  df  1000    timeit df apply lambda row  label race row   axis 1       1 15 s    46 5 ms per loop  mean    std  dev  of 7 runs  1 loop each    My Proposed Answer   def label race df       df loc df  eri white    1  race label      White      df loc df  eri hawaiian    1  race label      Haw Pac Isl       df loc df  eri afr amer    1  race label      Black AA      df loc df  eri asian    1  race label      Asian      df loc df  eri nat amer    1  race label      A I AK Native      df loc  df  eri afr amer     df  eri asian     df  eri hawaiian     df  eri nat amer     df  eri white     gt  1  race label      Two Or More      df loc df  eri hispanic    1  race label      Hispanic      df  race label   fillna  Other   inplace True  df pd read csv  s22 csv   df   pd concat  df  1000    timeit label race df       24 7 ms    1 7 ms per loop  mean    std  dev  of 7 runs  10 loops each

[python] pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

Examples related to python

Examples related to pandas

Examples related to numpy

Examples related to apply