For each row return the column name of the largest value

Question

I have a roster of employees  and I need to know at what department they are in most often  It is trivial to tabulate employee ID against department name  but it is trickier to return the department name  rather than the number of roster counts  from the frequency table  A simple example below  column names   departments  row names   employee ids    DF  lt - matrix sample 1 9 9  ncol 3 nrow 3  DF  lt - as data frame matrix DF   gt  DF   V1 V2 V3 1  2  7  9 2  8  3  6 3  1  5  4   Now how do I get   gt  DF2   RE 1 V3 2 V1 3 V2

User · Accepted Answer

One option using your data  for future reference  use set seed   to make examples using sample reproducible    DF  lt - data frame V1 c 2 8 1  V2 c 7 3 5  V3 c 9 6 4    colnames DF  apply DF 1 which max    1   V3   V1   V2    A faster solution than using apply might be max col   colnames DF  max col DF ties method  first      1   V3   V1   V2       where ties method can be any of  random   first  or  last   This of course causes issues if you happen to have two columns which are equal to the maximum  I m not sure what you want to do in that instance as you will have more than one result for some rows  E g    DF  lt - data frame V1 c 2 8 1  V2 c 7 3 5  V3 c 7 6 4   apply DF 1 function x  which x  max x       1   V2 V3   2  3     2   V1   1     3   V2   2

User · Answer

If you re interested in a data table solution  here s one  It s a bit tricky since you prefer to get the id for the first maximum  It s much easier if you d rather want the last maximum  Nevertheless  it s not that complicated and it s fast   Here I ve generated data of your dimensions  26746   18    Data  set seed 45  DF  lt - data frame matrix sample 10  26746 18  TRUE   ncol 18     data table answer   require data table  DT  lt - data table value unlist DF  use names FALSE                colid   1 nrow DF   rowid   rep names DF   each nrow DF    setkey DT  colid  value  t1  lt - DT J unique colid   DT J unique colid    value  mult  last     rowid  mult  first     Benchmarking     data table solution system time   DT  lt - data table value unlist DF  use names FALSE                colid   1 nrow DF   rowid   rep names DF   each nrow DF    setkey DT  colid  value  t1  lt - DT J unique colid   DT J unique colid    value  mult  last     rowid  mult  first          user  system elapsed     0 174   0 029   0 227     apply solution from  thelatemail system time t2  lt - colnames DF  apply DF 1 which max        user  system elapsed     2 322   0 036   2 602   identical t1  t2     1  TRUE   It s about 11 times faster on data of these dimensions  and data table scales pretty well too     Edit  if any of the max ids is okay  then   DT  lt - data table value unlist DF  use names FALSE                colid   1 nrow DF   rowid   rep names DF   each nrow DF    setkey DT  colid  value  t1  lt - DT J unique colid    rowid  mult  last

User · Answer

A dplyr solution   Idea    add rowids as a column reshape to long format filter for max in each group   Code    DF   data frame V1 c 2 8 1  V2 c 7 3 5  V3 c 9 6 4   DF   gt      rownames to column     gt     gather column  value  -rowname    gt     group by rowname    gt      filter rank -value     1     Result      A tibble  3 x 3   Groups    rowname  3    rowname column value    lt chr gt     lt chr gt    lt dbl gt  1 2       V1         8 2 3       V2         5 3 1       V3         9   This approach can be easily extended to get the top n columns   Example for n 2   DF   gt      rownames to column     gt     gather column  value  -rowname    gt     group by rowname    gt      mutate rk   rank -value     gt     filter rk  lt   2    gt      arrange rowname  rk     Result      A tibble  6 x 4   Groups    rowname  3    rowname column value    rk    lt chr gt     lt chr gt    lt dbl gt   lt dbl gt  1 1       V3         9     1 2 1       V2         7     2 3 2       V1         8     1 4 2       V3         6     2 5 3       V2         5     1 6 3       V3         4     2

User · Answer

Based on the above suggestions  the following data table solution worked very fast for me   library data table   set seed 45  DT  lt - data table matrix sample 10  10 7  TRUE   ncol 10    system time    DT   col max    colnames  SD  max col  SD  ties method    first         gt     user  system elapsed    gt     0 15    0 06    0 21 DT     gt           V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col max   gt        1   7  4  1  2  3  7  6  6  6   1      V1   gt        2   4  6  9 10  6  2  7  7  1   3      V4   gt        3   3  4  9  8  9  9  8  8  6   7      V3   gt        4   4  8  8  9  7  5  9  2  7   1      V4   gt        5   4  3  9 10  2  7  9  6  6   9      V4   gt       ---                                          gt   999996   4  6 10  5  4  7  3  8  2   8      V3   gt   999997   8  7  6  6  3 10  2  3 10   1      V6   gt   999998   2  3  2  7  4  7  5  2  7   3      V4   gt   999999   8 10  3  2  3  4  5  1  1   4      V2   gt  1000000  10  4  2  6  6  2  8  4  7   4      V1   And also comes with the advantage that can always specify what columns  SD should consider by mentioning them in  SDcols    DT   MAX2    colnames  SD  max col  SD  ties method  first      SDcols   c  V9    V10        In case we need the column name of the smallest value  as suggested by  lwshang  one just needs to use - SD   DT   col min    colnames  SD  max col - SD  ties method    first

User · Answer

One option from dplyr 1 0 0 could be  DF   gt    rowwise     gt    mutate row max   names    which max c across everything             V1    V2    V3 row max    lt dbl gt   lt dbl gt   lt dbl gt   lt chr gt    1     2     7     9 V3      2     8     3     6 V1      3     1     5     4 V2       Sample data  DF  lt - structure list V1   c 2  8  1   V2   c 7  3  5   V3   c 9  6   4    class    quot data frame quot   row names   c NA  -3L

User · Answer

A simple for loop can also be handy    gt  df lt -data frame V1 c 2 8 1  V2 c 7 3 5  V3 c 9 6 4    gt  df   V1 V2 V3 1  2  7  9 2  8  3  6 3  1  5  4  gt  df2 lt -data frame    gt  for  i in 1 nrow df        df2 i 1  lt -colnames df which max df i           gt  df2   V1 1 V3 2 V1 3 V2

User · Answer

One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another  group by the employer id  in this case  the row number   and then filter to the department s  with the max value  There are a couple of options for handling ties with this approach too   library tidyverse     sample data frame with a tie df  lt - data frame V1 c 2 8 1  V2 c 7 3 5  V3 c 9 6 5      If you aren t worried about ties    df   gt      rownames to column  id     gt      creates an ID number   gather dept  cnt  V1 V3    gt      group by id    gt      slice which max cnt       A tibble  3 x 3   Groups    id  3    id    dept    cnt    lt chr gt   lt chr gt   lt dbl gt  1 1     V3       9  2 2     V1       8  3 3     V2       5      If you re worried about keeping ties  df   gt      rownames to column  id     gt     gather dept  cnt  V1 V3    gt      group by id    gt      filter cnt    max cnt     gt     top n cnt  n   1  also works   arrange id     A tibble  4 x 3   Groups    id  3    id    dept    cnt    lt chr gt   lt chr gt   lt dbl gt  1 1     V3       9  2 2     V1       8  3 3     V2       5  4 3     V3       5      If you re worried about ties  but only want a certain department  you could use rank   and choose  first  or  last  df   gt      rownames to column  id     gt     gather dept  cnt  V1 V3    gt      group by id    gt      mutate dept rank    rank -cnt  ties method    first      gt     or  last    filter dept rank    1    gt      select -dept rank      A tibble  3 x 3   Groups    id  3    id    dept    cnt    lt chr gt   lt chr gt   lt dbl gt  1 2     V1       8  2 3     V2       5  3 1     V3       9     if you wanted to keep the original wide data frame df   gt      rownames to column  id     gt     left join      df   gt          rownames to column  id     gt         gather max dept  max cnt  V1 V3    gt          group by id    gt          slice which max max cnt         by    id         A tibble  3 x 6   id       V1    V2    V3 max dept max cnt    lt chr gt   lt dbl gt   lt dbl gt   lt dbl gt   lt chr gt        lt dbl gt  1 1        2     7     9  V3            9  2 2        8     3     6  V1            8  3 3        1     5     5  V2            5

User · Answer

Here is an answer that works with data table and is simpler  This assumes your data table is named yourDF   j1  lt - max col yourDF     V1  V2  V3  V4     first   yourDF newCol  lt - c  V1    V2    V3    V4   j1    Replace   V1    V2    V3    V4   and  V1  V2  V3  V4  with your column names

[r] For each row return the column name of the largest value

Examples related to r