Paste multiple columns together

Question

I have a bunch of columns in a dataframe which I want to paste together  seperated by  -   as follows   data  lt - data frame  a    1 3                       b    c  a   b   c                         c    c  d    e    f                         d    c  g    h    i    i e            a   b   c  d        1   a   d   g        2   b   e   h        3   c   f   i     Which I want to become     a x   1 a-d-g   2 b-e-h   3 c-f-i     I could normally do this with   within data  x  lt - paste b c d sep  -      and then removing the old columns  but unfortunately I do not know the names of the columns specifically  only a collective name for all of the columns  e g  I would know that cols  lt - c  b   c   d      Does anyone know a way of doing this

User · Answer

Using tidyr package  this can be easily handled in 1 function call   data  lt - data frame  a    1 3                       b    c  a   b   c                         c    c  d    e    f                         d    c  g    h    i     tidyr  unite  data  paste colnames data  -1   collapse       colnames data  -1      a b c d 1 1 a d g 2 2 b e h 3 3 c f i     Edit  Exclude first column  everything else gets pasted     tidyr 0 6 3  unite data  newCol  -a     or by column index unite data  newCol  -1       a newCol   1 1  a d g   2 2  b e h   3 3  c f i

User · Answer

I know this is an old question  but thought that I should anyway present the simple solution using the paste   function as suggested to by the questioner  data 1 lt -data frame a data a  quot x quot  paste data b data c data d sep  quot - quot     data 1   a     x 1 1 a-d-g 2 2 b-e-h 3 3 c-f-i

User · Answer

In my opinion the sprintf-function deserves a place among these answers as well  You can use sprintf as follows   do call sprintf  c d cols     s- s- s      which gives     1   a-d-g   b-e-h   c-f-i    And to create the required dataframe   data frame a   d a  x   do call sprintf  c d cols     s- s- s       giving     a     x 1 1 a-d-g 2 2 b-e-h 3 3 c-f-i   Although sprintf doesn t have a clear advantage over the do call paste combination of  BrianDiggs  it is especially usefull when you also want to pad certain parts of desired string or when you want to specify the number of digit  See  sprintf for the several options   Another variant would be to use pmap from purrr   pmap d 2 4   paste  sep    -     Note  this pmap solution only works when the columns aren t factors     A benchmark on a larger dataset     create a larger dataset d2  lt - d sample 1 3 1e6 TRUE      benchmark library microbenchmark  microbenchmark    docp   do call paste  c d2 cols   sep  -       appl   apply  d2   cols     1   paste   collapse    -       tidr   tidyr  unite  d2   x   cols  sep  -   x    docs   do call sprintf  c d2 cols     s- s- s       times 10    results in   Unit  milliseconds  expr       min        lq      mean    median        uq       max neval cld  docp  214 1786  226 2835  297 1487  241 6150  409 2495  493 5036    10 a    appl 3832 3252 4048 9320 4131 6906 4072 4235 4255 1347 4486 9787    10   c  tidr  206 9326  216 8619  275 4556  252 1381  318 4249  407 9816    10 a    docs  413 9073  443 1550  490 6520  453 1635  530 1318  659 8400    10  b      Used data   d  lt - data frame a   1 3  b   c  a   b   c    c   c  d   e   f    d   c  g   h   i

User · Answer

As a variant on baptiste s answer  with data defined as you have and the columns that you want to put together defined in cols  cols  lt - c  b    c    d     You can add the new column to data and delete the old ones with  data x  lt - do call paste  c data cols   sep  -    for  co in cols  data co   lt - NULL   which gives   gt  data   a     x 1 1 a-d-g 2 2 b-e-h 3 3 c-f-i

User · Answer

Here s a fairly unconventional  but fast  approach  use fwrite from data table to  quot paste quot  the columns together  and fread to read it back in  For convenience  I ve written the steps as a function called fpaste  fpaste  lt - function dt  sep    quot   quot       x  lt - tempfile     fwrite dt  file   x  sep   sep  col names   FALSE    fread x  sep    quot  n quot   header   FALSE     Here s an example  d  lt - data frame a   1 3  b   c  a   b   c    c   c  d   e   f    d   c  g   h   i     cols   c  quot b quot    quot c quot    quot d quot    fpaste d cols    quot - quot           V1   1  a-d-g   2  b-e-h   3  c-f-i  How does it perform  d2  lt - d sample 1 3 1e6 TRUE       library microbenchmark  microbenchmark    docp   do call paste  c d2 cols   sep  quot - quot       tidr   tidyr  unite  d2   quot x quot   cols  sep  quot - quot   x    docs   do call sprintf  c d2 cols     s- s- s       appl   apply  d2   cols     1   paste   collapse    quot - quot       fpaste   fpaste d2 cols    quot - quot   V1    dt2   as data table d2    x    Reduce function      paste      sep    quot - quot     SD    SDcols   cols       times 10    Unit  milliseconds      expr        min         lq      mean     median         uq       max neval      docp  215 34536  217 22102  220 3603  221 44104  223 27224  225 0906    10      tidr  215 19907  215 81210  220 7131  220 09636  225 32717  229 6822    10      docs  281 16679  285 49786  289 4514  286 68738  290 17249  312 5484    10      appl 2816 61899 3106 19944 3259 3924 3266 45186 3401 80291 3804 7263    10    fpaste   88 57108   89 67795  101 1524   90 59217   91 76415  197 1555    10       dt2  301 95508  310 79082  384 8247  316 29807  383 94993  874 4472    10

User · Answer

library plyr   ldply apply data  1  function x  data frame                        x   paste x 2 4  sep    collapse  -              x  1 a-d-g  2 b-e-h  3 c-f-i     and with just the vector of names you have   ldply apply data  1  function x  data frame                        x   paste x c  b   c   d    sep    collapse  -         or equally  mynames  lt -c  b   c   d   ldply apply data  1  function x  data frame                        x   paste x mynames  sep    collapse  -

User · Answer

your starting data   data  lt - data frame  a    1 3   b    c  a   b   c     c    c  d    e    f     d    c  g    h    i        columns to paste together cols  lt - c   b     c     d       create a new column  x  with the three columns collapsed together data x  lt - apply  data    cols     1   paste   collapse    -       remove the unnecessary columns data  lt - data       names  data    in  cols

User · Answer

I benchmarked the answers of Anthony Damico  Brian Diggs and data steve on a small sample tbl df and got the following results    gt  data  lt - data frame  a    1 3                         b    c  a   b   c                           c    c  d    e    f                           d    c  g    h    i     gt  data  lt - tbl df data   gt  cols  lt - c  b    c    d    gt  microbenchmark        do call paste  c data cols   sep  -           apply  data    cols     1   paste   collapse    -           tidyr  unite  data   x   cols  sep  -   x        times 1000     Unit  microseconds                                          expr     min      lq      mean  median       uq       max neval do call paste  c data cols   sep    -          65 248  78 380  93 90888  86 177  99 3090   436 220  1000 apply data   cols   1  paste  collapse    -   223 239 263 044 313 11977 289 514 338 5520   743 583  1000 tidyr  unite  data   x   cols  sep    -   x   376 716 448 120 556 65424 501 877 606 9315 11537 846  1000   However  when I evaluated on my own tbl df with  1 million rows and 10 columns the results were quite different     gt  microbenchmark        do call paste  c data c  a    b     sep  -           apply  data    c  a    b       1   paste   collapse    -           tidyr  unite  data   c   c  a    b    sep  -   c        times 25     Unit  milliseconds                                                        expr        min         lq      mean     median        uq       max neval do call paste  c data c  a    b     sep  -                    930 7208   951 3048  1129 334   997 2744  1066 084  2169 147    25 apply  data    c  a    b       1   paste   collapse    -     9368 2800 10948 0124 11678 393 11136 3756 11878 308 17587 617    25 tidyr  unite  data   c   c  a    b    sep  -   c              968 5861  1008 4716  1095 886  1035 8348  1082 726  1759 349    25

User · Answer

I d construct a new data frame   d  lt - data frame  a    1 3   b    c  a   b   c     c    c  d    e    f     d    c  g    h    i      cols  lt - c   b     c     d     data frame a   d    a    x   do call paste  c d    cols   list sep    -

User · Answer

Just to add additional solution with Reduce which probably is slower than do call but probebly better than apply because it will avoid the matrix conversion  Also  instead a for loop we could just use setdiff in order to remove unwanted columns  cols  lt - c  b   c   d   data x  lt - Reduce function      paste      sep    -    data cols   data setdiff names data   cols       a     x   1 1 a-d-g   2 2 b-e-h   3 3 c-f-i     Alternatively we could update data in place using the data table package  assuming fresh data   library data table  setDT data    x    Reduce function      paste      sep    -     SD   mget cols     data    cols     NULL  data      a     x   1  1 a-d-g   2  2 b-e-h   3  3 c-f-i   Another option is to use  SDcols instead of mget as in  setDT data    x    Reduce function      paste      sep    -     SD    SDcols   cols

[r] Paste multiple columns together

Examples related to r

Examples related to paste

Examples related to r-faq