Based on the above suggestions, the following data.table
solution worked very fast for me:
library(data.table)
set.seed(45)
DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))
system.time(
DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
)
#> user system elapsed
#> 0.15 0.06 0.21
DT[]
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
#> 1: 7 4 1 2 3 7 6 6 6 1 V1
#> 2: 4 6 9 10 6 2 7 7 1 3 V4
#> 3: 3 4 9 8 9 9 8 8 6 7 V3
#> 4: 4 8 8 9 7 5 9 2 7 1 V4
#> 5: 4 3 9 10 2 7 9 6 6 9 V4
#> ---
#> 999996: 4 6 10 5 4 7 3 8 2 8 V3
#> 999997: 8 7 6 6 3 10 2 3 10 1 V6
#> 999998: 2 3 2 7 4 7 5 2 7 3 V4
#> 999999: 8 10 3 2 3 4 5 1 1 4 V2
#> 1000000: 10 4 2 6 6 2 8 4 7 4 V1
And also comes with the advantage that can always specify what columns .SD
should consider by mentioning them in .SDcols
:
DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]
In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD
:
DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]