How to reshape data from long to wide format

Question

I m having trouble rearranging the following data frame   set seed 45  dat1  lt - data frame      name   rep c  firstName    secondName    each 4       numbers   rep 1 4  2       value   rnorm 8         dat1        name  numbers      value 1  firstName       1  0 3407997 2  firstName       2 -0 7033403 3  firstName       3 -0 3795377 4  firstName       4 -0 7460474 5 secondName       1 -0 8981073 6 secondName       2 -0 3347941 7 secondName       3 -0 5013782 8 secondName       4 -0 1745357   I want to reshape it so that each unique  name  variable is a rowname  with the  values  as observations along that row and the  numbers  as colnames  Sort of like this        name          1          2          3         4 1  firstName  0 3407997 -0 7033403 -0 3795377 -0 7460474 5 secondName -0 8981073 -0 3347941 -0 5013782 -0 1745357   I ve looked at melt and cast and a few other things  but none seem to do the job

User · Answer

There s very powerful new package from genius data scientists at Win-Vector  folks that made vtreat  seplyr and replyr  called cdata  It implements  coordinated data  principles described in this document and also in this blog post  The idea is that regardless how you organize your data  it should be possible to identify individual data points using a system of  data coordinates   Here s a excerpt from the recent blog post by John Mount      The whole system is based on two primitives or operators   cdata  moveValuesToRowsD   and cdata  moveValuesToColumnsD    These   operators have pivot  un-pivot  one-hot encode  transpose  moving   multiple rows and columns  and many other transforms as simple special   cases        It is easy to write many different operations in terms of the   cdata primitives  These operators can work-in memory or at big data   scale  with databases and Apache Spark  for big data use the   cdata  moveValuesToRowsN   and cdata  moveValuesToColumnsN     variants   The transforms are controlled by a control table that   itself is a diagram of  or picture of  the transform    We will first build the control table  see blog post for details  and then perform the move of data from rows to columns   library cdata    first build the control table pivotControlTable  lt - buildPivotControlTableD table   dat1    reference to dataset                         columnToTakeKeysFrom    numbers     this will become column headers                         columnToTakeValuesFrom    value     this contains data                         sep                                 optional for making column names    perform the move of data to columns dat wide  lt - moveValuesToColumnsD tallTable    dat1    reference to dataset                     keyColumns   c  name              this these  column s  should stay untouched                      controlTable   pivotControlTable  control table above                        dat wide    gt          name  numbers 1  numbers 2  numbers 3  numbers 4   gt  1  firstName  0 3407997 -0 7033403 -0 3795377 -0 7460474   gt  2 secondName -0 8981073 -0 3347941 -0 5013782 -0 1745357

User · Answer

Using base R aggregate function   aggregate value   name  dat1  I     name           value 1  value 2  value 3  value 4  1 firstName      0 4145  -0 4747   0 0659   -0 5024  2 secondName    -0 8259   0 1669  -0 8962    0 1681

User · Answer

Another option if performance is a concern is to use data table s extension of reshape2 s melt  amp  dcast functions   Reference  Efficient reshaping using data tables   library data table   setDT dat1  dcast dat1  name   numbers  value var    value               name          1          2         3         4   1   firstName  0 1836433 -0 8356286 1 5952808 0 3295078   2  secondName -0 8204684  0 4874291 0 7383247 0 5757814   And  as of data table v1 9 6 we can cast on multiple columns     add an extra column dat1   value2    value   2      cast multiple value columns dcast dat1  name   numbers  value var   c  value    value2                name    value 1    value 2   value 3   value 4   value2 1   value2 2 value2 3  value2 4   1   firstName  0 1836433 -0 8356286 1 5952808 0 3295078  0 3672866 -1 6712572 3 190562 0 6590155   2  secondName -0 8204684  0 4874291 0 7383247 0 5757814 -1 6409368  0 9748581 1 476649 1 1515627

User · Answer

With the devel version of tidyr    0 8 3 9000     there is pivot wider and pivot longer which is generalized to do the reshaping  long -  wide  wide -  long  respectively  from 1 to multiple columns   Using the OP s data  -single column long -  wide  library dplyr  library tidyr  dat1   gt        pivot wider names from   numbers  values from   value    A tibble  2 x 5    name           1      2      3      4      lt fct gt         lt dbl gt    lt dbl gt    lt dbl gt    lt dbl gt   1 firstName   0 341 -0 703 -0 380 -0 746  2 secondName -0 898 -0 335 -0 501 -0 175   -  created another column for showing the functionality  dat1   gt        mutate value2   value   2    gt        pivot wider names from   numbers  values from   c  value    value2      A tibble  2 x 9    name       value 1 value 2 value 3 value 4 value2 1 value2 2 value2 3 value2 4     lt fct gt          lt dbl gt     lt dbl gt     lt dbl gt     lt dbl gt      lt dbl gt      lt dbl gt      lt dbl gt      lt dbl gt   1 firstName    0 341  -0 703  -0 380  -0 746    0 682   -1 41    -0 759   -1 49   2 secondName  -0 898  -0 335  -0 501  -0 175   -1 80    -0 670   -1 00    -0 349

User · Answer

Using reshape function   reshape dat1  idvar    name   timevar    numbers   direction    wide

User · Answer

The base reshape function works perfectly fine   df  lt - data frame    year     c rep 2000  12   rep 2001  12      month    rep 1 12  2     values   rnorm 24    df wide  lt - reshape df  idvar  year   timevar  month   v names  values   direction  wide   sep      df wide   Where   idvar is the column of classes that separates rows timevar is the column of classes to cast wide v names is the column containing numeric values direction specifies wide or long format the optional sep argument is the separator used in between timevar class names and v names in the output data frame     If no idvar exists  create one before using the reshape   function   df id    lt - c rep  year1   12   rep  year2   12   df wide  lt - reshape df  idvar  id   timevar  month   v names  values   direction  wide   sep      df wide   Just remember that idvar is required  The timevar and v names part is easy  The output of this function is more predictable than some of the others  as everything is explicitly defined

User · Answer

The new  in 2014  tidyr package also does this simply  with gather   spread   being the terms for melt cast  Edit  Now  in 2019  tidyr v 1 0 has launched and set spread and gather on a deprecation path  preferring instead pivot wider and pivot longer  which you can find described in this answer  Read on if you want a brief glimpse into the brief life of spread gather  library tidyr  spread dat1  key   numbers  value   value   From github   tidyr is a reframing of reshape2 designed to accompany the tidy data framework  and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis  Just as reshape2 did less than reshape  tidyr does less than reshape2  It s designed specifically for tidying data  not the general reshaping that reshape2 does  or the general aggregation that reshape did  In particular  built-in methods only work for data frames  and tidyr provides no margins or aggregation

User · Answer

Using your example dataframe  we could   xtabs value   name   numbers  data   dat1

User · Answer

much easier way   devtools  install github  yikeshu0611 onetree    install onetree package  library onetree  widedata reshape toWide data   dat1 id    name  j    numbers  value var prefix    value   widedata          name     value1     value2     value3     value4    firstName  0 3407997 -0 7033403 -0 3795377 -0 7460474   secondName -0 8981073 -0 3347941 -0 5013782 -0 1745357   if you want to go back from wide to long  only change Wide to Long  and no changes in objects   reshape toLong data   widedata id    name  j    numbers  value var prefix    value            name numbers      value    firstName       1  0 3407997   secondName       1 -0 8981073    firstName       2 -0 7033403   secondName       2 -0 3347941    firstName       3 -0 3795377   secondName       3 -0 5013782    firstName       4 -0 7460474   secondName       4 -0 1745357

User · Answer

Other two options   Base package   df  lt - unstack dat1  form   value   numbers  rownames df   lt - unique dat1 name  df   sqldf package   library sqldf  sqldf  SELECT name        MAX CASE WHEN numbers   1 THEN value ELSE NULL END  x1         MAX CASE WHEN numbers   2 THEN value ELSE NULL END  x2        MAX CASE WHEN numbers   3 THEN value ELSE NULL END  x3        MAX CASE WHEN numbers   4 THEN value ELSE NULL END  x4       FROM dat1       GROUP BY name

User · Answer

You can do this with the reshape   function  or with the melt     cast   functions in the reshape package  For the second option  example code is  library reshape  cast dat1  name   numbers    Or using reshape2  library reshape2  dcast dat1  name   numbers

[r] How to reshape data from long to wide format

Examples related to r

Examples related to reshape

Examples related to r-faq