In my opinion the sprintf
-function deserves a place among these answers as well. You can use sprintf
as follows:
do.call(sprintf, c(d[cols], '%s-%s-%s'))
which gives:
[1] "a-d-g" "b-e-h" "c-f-i"
And to create the required dataframe:
data.frame(a = d$a, x = do.call(sprintf, c(d[cols], '%s-%s-%s')))
giving:
a x
1 1 a-d-g
2 2 b-e-h
3 3 c-f-i
Although sprintf
doesn't have a clear advantage over the do.call
/paste
combination of @BrianDiggs, it is especially usefull when you also want to pad certain parts of desired string or when you want to specify the number of digit. See ?sprintf
for the several options.
Another variant would be to use pmap
from purrr:
pmap(d[2:4], paste, sep = '-')
Note: this pmap
solution only works when the columns aren't factors.
A benchmark on a larger dataset:
# create a larger dataset
d2 <- d[sample(1:3,1e6,TRUE),]
# benchmark
library(microbenchmark)
microbenchmark(
docp = do.call(paste, c(d2[cols], sep="-")),
appl = apply( d2[, cols ] , 1 , paste , collapse = "-" ),
tidr = tidyr::unite_(d2, "x", cols, sep="-")$x,
docs = do.call(sprintf, c(d2[cols], '%s-%s-%s')),
times=10)
results in:
Unit: milliseconds
expr min lq mean median uq max neval cld
docp 214.1786 226.2835 297.1487 241.6150 409.2495 493.5036 10 a
appl 3832.3252 4048.9320 4131.6906 4072.4235 4255.1347 4486.9787 10 c
tidr 206.9326 216.8619 275.4556 252.1381 318.4249 407.9816 10 a
docs 413.9073 443.1550 490.6520 453.1635 530.1318 659.8400 10 b
Used data:
d <- data.frame(a = 1:3, b = c('a','b','c'), c = c('d','e','f'), d = c('g','h','i'))