Find the max of two or more columns with pandas

Question

I have a dataframe with columns A B  I need to create a column C such that for every record   row   C   max A  B    How should I go about doing this

User · Answer

DSM s answer is perfectly fine in almost any normal scenario  But if you re the type of programmer who wants to go a little deeper than the surface level  you might be interested to know that it is a little faster to call numpy functions on the underlying  to numpy    or  values for  lt 0 24  array instead of directly calling the  cythonized  functions defined on the DataFrame Series objects    For example  you can use ndarray max   along the first axis     Data borrowed from  DSM s post  df   pd DataFrame   A    1 2 3    B    -2  8  1    df    A  B 0  1 -2 1  2  8 2  3  1  df  C     df   A    B    values max 1    Or  assuming  A  and  B  are the only columns     df  C     df values max 1   df     A  B  C 0  1 -2  1 1  2  8  8 2  3  1  3    If your data has NaNs  you will need numpy nanmax   df  C     np nanmax df values  axis 1  df     A  B  C 0  1 -2  1 1  2  8  8 2  3  1  3      You can also use numpy maximum reduce  numpy maximum is a ufunc  Universal Function   and every ufunc has a reduce   df  C     np maximum reduce df  A    B    values  axis 1    df  C     np maximum reduce df   A    B     axis 1    df  C     np maximum reduce df  axis 1  df     A  B  C 0  1 -2  1 1  2  8  8 2  3  1  3       np maximum reduce and np max appear to be more or less the same  for most normal sized DataFrames    and happen to be a shade faster than DataFrame max  I imagine this difference roughly remains constant  and is due to internal overhead  indexing alignment  handling NaNs  etc    The graph was generated using perfplot  Benchmarking code  for reference   import pandas as pd import perfplot  np random seed 0  df    pd DataFrame np random randn 5  1000    perfplot show      setup lambda n  pd concat  df     n  ignore index True       kernels           lambda df  df assign new df max axis 1            lambda df  df assign new df values max 1            lambda df  df assign new np nanmax df values  axis 1            lambda df  df assign new np maximum reduce df values  axis 1               labels   df max    np max    np maximum reduce    np nanmax        n range  2  k for k in range 0  15        xlabel  N    len df         logx True      logy True

User · Answer

You can get the maximum like this    gt  gt  gt  import pandas as pd  gt  gt  gt  df   pd DataFrame   A    1 2 3    B    -2  8  1     gt  gt  gt  df    A  B 0  1 -2 1  2  8 2  3  1  gt  gt  gt  df   A    B       A  B 0  1 -2 1  2  8 2  3  1  gt  gt  gt  df   A    B    max axis 1  0    1 1    8 2    3   and so    gt  gt  gt  df  C     df   A    B    max axis 1   gt  gt  gt  df    A  B  C 0  1 -2  1 1  2  8  8 2  3  1  3   If you know that  A  and  B  are the only columns  you could even get away with   gt  gt  gt  df  C     df max axis 1    And you could use  apply max  axis 1  too  I guess

[python] Find the max of two or more columns with pandas

Examples related to python

Examples related to dataframe

Examples related to pandas