Gather multiple sets of columns

Question

I have data from an online survey where respondents go through a loop of questions 1-3 times  The survey software  Qualtrics  records this data in multiple columns   that is  Q3 2 in the survey will have columns Q3 2 1   Q3 2 2   and Q3 2 3     df  lt - data frame    id   1 10    time   as Date  2009-01-01     0 9    Q3 2 1    rnorm 10  0  1     Q3 2 2    rnorm 10  0  1     Q3 2 3    rnorm 10  0  1     Q3 3 1    rnorm 10  0  1     Q3 3 2    rnorm 10  0  1     Q3 3 3    rnorm 10  0  1       Sample data     id       time    Q3 2 1      Q3 2 2     Q3 2 3      Q3 3 1     Q3 3 2      Q3 3 3  1   1 2009-01-01 -0 2059165 -0 29177677 -0 7107192  1 52718069 -0 4484351 -1 21550600 2   2 2009-01-02 -0 1981136 -1 19813815  1 1750200 -0 40380049 -1 8376094  1 03588482 3   3 2009-01-03  0 3514795 -0 27425539  1 1171712 -1 02641801 -2 0646661 -0 35353058       I want to combine all the QN N  columns into tidy individual QN N columns  ultimately ending up with something like this      id       time loop number        Q3 2        Q3 3 1   1 2009-01-01           1 -0 20591649  1 52718069 2   2 2009-01-02           1 -0 19811357 -0 40380049 3   3 2009-01-03           1  0 35147949 -1 02641801     11  1 2009-01-01           2 -0 29177677  -0 4484351 12  2 2009-01-02           2 -1 19813815  -1 8376094 13  3 2009-01-03           2 -0 27425539  -2 0646661     21  1 2009-01-01           3 -0 71071921 -1 21550600 22  2 2009-01-02           3  1 17501999  1 03588482 23  3 2009-01-03           3  1 11717121 -0 35353058       The tidyr library has the gather   function  which works great for combining one set of columns   library dplyr  library tidyr  library stringr   df   gt   gather loop number  Q3 2  starts with  Q3 2      gt      mutate loop number   str sub loop number -2 -2     gt     select id  time  loop number  Q3 2       id       time loop number        Q3 2 1   1 2009-01-01           1 -0 20591649 2   2 2009-01-02           1 -0 19811357 3   3 2009-01-03           1  0 35147949     29  9 2009-01-09           3 -0 58581232 30 10 2009-01-10           3 -2 33393981   The resultant data frame has 30 rows  as expected  10 individuals  3 loops each   However  gathering a second set of columns does not work correctly   it successfully makes the two combined columns Q3 2 and Q3 3  but ends up with 90 rows instead of 30  all combinations of 10 individuals  3 loops of Q3 2  and 3 loops of Q3 3  the combinations will increase substantially for each group of columns in the actual data    df   gt   gather loop number  Q3 2  starts with  Q3 2      gt      gather loop number  Q3 3  starts with  Q3 3      gt     mutate loop number   str sub loop number -2 -2        id       time loop number        Q3 2        Q3 3 1   1 2009-01-01           1 -0 20591649  1 52718069 2   2 2009-01-02           1 -0 19811357 -0 40380049 3   3 2009-01-03           1  0 35147949 -1 02641801     89  9 2009-01-09           3 -0 58581232 -0 13187024 90 10 2009-01-10           3 -2 33393981 -0 48502131   Is there a way to use multiple calls to gather   like this  combining small subsets of columns like this while maintaining the correct number of rows

User · Answer

It s not at all related to  tidyr  and  dplyr   but here s another option to consider  merged stack from my  splitstackshape  package  V1 4 0 and above   library splitstackshape  merged stack df  id vars   c  id    time                  var stubs   c  Q3 2     Q3 3                  sep    var stubs         id       time  time 1       Q3 2        Q3 3     1   1 2009-01-01      1  -0 62645381  1 35867955    2   1 2009-01-01      2   1 51178117 -0 16452360    3   1 2009-01-01      3   0 91897737  0 39810588    4   2 2009-01-02      1   0 18364332 -0 10278773    5   2 2009-01-02      2   0 38984324 -0 25336168    6   2 2009-01-02      3   0 78213630 -0 61202639    7   3 2009-01-03      1  -0 83562861  0 38767161    lt  lt    SNIP    gt  gt    24   8 2009-01-08      3  -1 47075238 -1 04413463   25   9 2009-01-09      1   0 57578135  1 10002537   26   9 2009-01-09      2   0 82122120 -0 11234621   27   9 2009-01-09      3  -0 47815006  0 56971963   28  10 2009-01-10      1  -0 30538839  0 76317575   29  10 2009-01-10      2   0 59390132  0 88110773   30  10 2009-01-10      3   0 41794156 -0 13505460       id       time  time 1       Q3 2        Q3 3

User · Answer

With the recent update to melt data table  we can now melt multiple columns  With that  we can do   require data table     1 9 5 melt setDT df   id 1 2  measure patterns   Q3 2     Q3 3          value name c  Q3 2    Q3 3    variable name  loop number         id       time loop number         Q3 2        Q3 3    1   1 2009-01-01           1 -0 433978480  0 41227209    2   2 2009-01-02           1 -0 567995351  0 30701144    3   3 2009-01-03           1 -0 092041353 -0 96024077    4   4 2009-01-04           1  1 137433487  0 60603396    5   5 2009-01-05           1 -1 071498263 -0 01655584    6   6 2009-01-06           1 -0 048376809  0 55889996    7   7 2009-01-07           1 -0 007312176  0 69872938   You can get the development version from here

User · Answer

This could be done using reshape   It is possible with dplyr though    colnames df   lt - gsub  quot       2    quot    quot    1 quot   colnames df     colnames df  2   lt -  quot Date quot    res  lt - reshape df  idvar c  quot id quot    quot Date quot    varying 3 8  direction  quot long quot   sep  quot   quot     row names res   lt - 1 nrow res        head res       id       Date time       Q3 2       Q3 3    1  1 2009-01-01    1  1 3709584  0 4554501    2  2 2009-01-02    1 -0 5646982  0 7048373    3  3 2009-01-03    1  0 3631284  1 0351035    4  4 2009-01-04    1  0 6328626 -0 6089264    5  5 2009-01-05    1  0 4042683  0 5049551    6  6 2009-01-06    1 -0 1061245 -1 7170087  Or   using dplyr   library tidyr    library dplyr    colnames df   lt - gsub  quot       2    quot    quot    1 quot   colnames df      df   gt        gather loop number   quot Q3 quot   starts with  quot Q3 quot      gt         separate loop number c  quot L1 quot    quot L2 quot    sep  quot   quot     gt         spread L1  Q3    gt        select -L2    gt        head        id       time       Q3 2       Q3 3    1  1 2009-01-01  1 3709584  0 4554501    2  1 2009-01-01  1 3048697  0 2059986    3  1 2009-01-01 -0 3066386  0 3219253    4  2 2009-01-02 -0 5646982  0 7048373    5  2 2009-01-02  2 2866454 -0 3610573    6  2 2009-01-02 -1 7813084 -0 7838389  Update With new version of tidyr  we can use pivot longer to reshape multiple columns    Using the changed column names from gsub above  library dplyr  library tidyr  df   gt        pivot longer cols   starts with  quot Q3 quot               names to   c  quot  value quot    quot Q3 quot    names sep    quot   quot     gt        select -Q3    A tibble  30 x 4        id time         Q3 2    Q3 3      lt int gt   lt date gt        lt dbl gt     lt dbl gt    1     1 2009-01-01  0 974  1 47     2     1 2009-01-01 -0 849 -0 513    3     1 2009-01-01  0 894  0 0442   4     2 2009-01-02  2 04  -0 553    5     2 2009-01-02  0 694  0 0972   6     2 2009-01-02 -1 11   1 85     7     3 2009-01-03  0 413  0 733    8     3 2009-01-03 -0 896 -0 271   9     3 2009-01-03  0 509 -0 0512  10     4 2009-01-04  1 81   0 668        with 20 more rows  NOTE  Values are different because there was no set seed in creating the input dataset

User · Answer

In case you are like me  and cannot work out how to use  regular expression with capturing groups  for extract  the following code replicates the extract      line in Hadleys  answer   df   gt        gather question number  value  starts with  Q3       gt       mutate loop number   str sub question number -2 -2   question number   str sub question number 1 4     gt       select id  time  loop number  question number  value    gt        spread key   question number  value   value    The problem here is that the initial gather forms a key column that is actually a combination of two keys  I chose to use mutate in my original solution in the comments to split this column into two columns with equivalent info  a loop number column and a question number column  spread can then be used to transform the long form data  which are key value pairs  question number  value  to wide form data

User · Answer

This approach seems pretty natural to me   df   gt     gather key  value  -id  -time    gt     extract key  c  question    loop number      Q                 gt     spread question  value    First gather all question columns  use extract   to separate into question and loop number  then spread   question back into the columns     gt     id       time loop number         Q3 2        Q3 3   gt  1   1 2009-01-01           1  0 142259203 -0 35842736   gt  2   1 2009-01-01           2  0 061034802  0 79354061   gt  3   1 2009-01-01           3 -0 525686204 -0 67456611   gt  4   2 2009-01-02           1 -1 044461185 -1 19662936   gt  5   2 2009-01-02           2  0 393808163  0 42384717

[r] Gather multiple sets of columns

Examples related to r

Examples related to reshape

Examples related to dplyr

Examples related to qualtrics

Examples related to tidyr