Reshaping data frame from wide to long format

Question

I have some trouble to convert my data frame from a wide table to a long table  At the moment it looks like this   Code Country        1950    1951    1952    1953    1954 AFG  Afghanistan    20 249  21 352  22 532  23 557  24 555 ALB  Albania        8 097   8 986   10 058  11 123  12 246   Now I would like to transform this data frame into a long data frame  Something like this   Code Country        Year    Value AFG  Afghanistan    1950    20 249 AFG  Afghanistan    1951    21 352 AFG  Afghanistan    1952    22 532 AFG  Afghanistan    1953    23 557 AFG  Afghanistan    1954    24 555 ALB  Albania        1950    8 097 ALB  Albania        1951    8 986 ALB  Albania        1952    10 058 ALB  Albania        1953    11 123 ALB  Albania        1954    12 246   I have looked at and already tried using the melt   and the reshape   functions as some people were suggesting in similar questions  However  so far I only get messy results   If it is possible I would like to do it with the reshape   function since it looks a little bit nicer to handle

User · Answer

you can also see many examples in R cookbook olddata wide  lt - read table header TRUE  text    subject sex control cond1 cond2        1   M     7 9  12 3  10 7        2   F     6 3  10 6  11 1        3   F     9 5  13 1  13 8        4   M    11 5  13 4  12 9      Make sure the subject column is a factor olddata wide subject  lt - factor olddata wide subject  olddata long  lt - read table header TRUE  text    subject sex condition measurement        1   M   control         7 9        1   M     cond1        12 3        1   M     cond2        10 7        2   F   control         6 3        2   F     cond1        10 6        2   F     cond2        11 1        3   F   control         9 5        3   F     cond1        13 1        3   F     cond2        13 8        4   M   control        11 5        4   M     cond1        13 4        4   M     cond2        12 9      Make sure the subject column is a factor olddata long subject  lt - factor olddata long subject

User · Answer

With tidyr 1 0 0  another option is pivot longer  library tidyr  pivot longer df1  -c Code  Country   values to    Value   names to    Year     A tibble  10 x 4     Code  Country     Year  Value       lt fct gt   lt fct gt         lt chr gt   lt fct gt     1 AFG   Afghanistan 1950  20 249   2 AFG   Afghanistan 1951  21 352   3 AFG   Afghanistan 1952  22 532   4 AFG   Afghanistan 1953  23 557   5 AFG   Afghanistan 1954  24 555   6 ALB   Albania     1950  8 097    7 ALB   Albania     1951  8 986    8 ALB   Albania     1952  10 058   9 ALB   Albania     1953  11 123  10 ALB   Albania     1954  12 246   data  df1  lt - structure list Code   structure 1 2   Label   c  AFG    ALB    class    factor         Country   structure 1 2   Label   c  Afghanistan    Albania         class    factor     1950    structure 1 2   Label   c  20 249         8 097    class    factor     1951    structure 1 2   Label   c  21 352         8 986    class    factor     1952    structure 2 1   Label   c  10 058         22 532    class    factor     1953    structure 2 1   Label   c  11 123         23 557    class    factor     1954    structure 2 1   Label   c  12 246         24 555    class    factor     class    data frame   row names   c NA   -2L

User · Answer

You can also use the cdata package  which uses the concept of  transformation  control table    data wide  lt - read table text  quot Code Country        1950    1951    1952    1953    1954 AFG  Afghanistan    20 249  21 352  22 532  23 557  24 555 ALB  Albania        8 097   8 986   10 058  11 123  12 246 quot   header TRUE  check names FALSE   library cdata    build control table drec  lt - data frame      Year as character 1950 1954       Value as character 1950 1954       stringsAsFactors FALSE   drec  lt - cdata  rowrecs to blocks spec drec  recordKeys c  quot Code quot    quot Country quot       apply control table cdata  layout by drec  wide   I am currently exploring that package and find it quite accessible  It is designed for much more complicated transformations and includes the backtransformation  There is a tutorial available

User · Answer

reshape   takes a while to get used to  just as melt cast  Here is a solution with reshape  assuming your data frame is called d   reshape d           direction    long           varying   list names d  3 7            v names    Value           idvar   c  Code    Country            timevar    Year           times   1950 1954

User · Answer

Here is another example showing the use of gather from tidyr  You can select the columns to gather either by removing them individually  as I do here   or by including the years you want explicitly   Note that  to handle the commas  and X s added if check names   FALSE is not set   I am also using dplyr s mutate with parse number from readr to convert the text values back to numbers  These are all part of the tidyverse and so can be loaded together with library tidyverse   wide   gt     gather Year  Value  -Code  -Country    gt     mutate Year   parse number Year             Value   parse number Value     Returns      Code     Country Year Value 1   AFG Afghanistan 1950 20249 2   ALB     Albania 1950  8097 3   AFG Afghanistan 1951 21352 4   ALB     Albania 1951  8986 5   AFG Afghanistan 1952 22532 6   ALB     Albania 1952 10058 7   AFG Afghanistan 1953 23557 8   ALB     Albania 1953 11123 9   AFG Afghanistan 1954 24555 10  ALB     Albania 1954 12246

User · Answer

Here s a sqldf solution   sqldf  Select Code  Country   1950  As Year   1950  As Value From wide         Union All        Select Code  Country   1951  As Year   1951  As Value From wide         Union All        Select Code  Country   1952  As Year   1952  As Value From wide         Union All        Select Code  Country   1953  As Year   1953  As Value From wide         Union All        Select Code  Country   1954  As Year   1954  As Value From wide      To make the query without typing in everything  you can use the following    Thanks to G  Grothendieck for implementing it   ValCol  lt - tail names wide   -2   s  lt - sprintf  Select Code  Country    s  As Year    s  As Value from wide   ValCol  ValCol  mquery  lt - paste s  collapse     n Union All n    cat mquery   just to show the query    gt  Select Code  Country   1950  As Year   1950  As Value from wide    gt   Union All    gt  Select Code  Country   1951  As Year   1951  As Value from wide    gt   Union All    gt  Select Code  Country   1952  As Year   1952  As Value from wide    gt   Union All    gt  Select Code  Country   1953  As Year   1953  As Value from wide    gt   Union All    gt  Select Code  Country   1954  As Year   1954  As Value from wide  sqldf mquery       gt     Code     Country Year  Value    gt  1   AFG Afghanistan 1950 20 249    gt  2   ALB     Albania 1950  8 097    gt  3   AFG Afghanistan 1951 21 352    gt  4   ALB     Albania 1951  8 986    gt  5   AFG Afghanistan 1952 22 532    gt  6   ALB     Albania 1952 10 058    gt  7   AFG Afghanistan 1953 23 557    gt  8   ALB     Albania 1953 11 123    gt  9   AFG Afghanistan 1954 24 555    gt  10  ALB     Albania 1954 12 246   Unfortunately  I don t think that PIVOT and UNPIVOT would work for R SQLite  If you want to write up your query in a more sophisticated manner  you can also take a look at these posts    Using sprintf writing up sql queries nbsp  nbsp  Or  nbsp  nbsp  Pass variables to sqldf

User · Answer

Using reshape package    data x  lt - read table textConnection   Code Country        1950    1951    1952    1953    1954 AFG  Afghanistan    20 249  21 352  22 532  23 557  24 555 ALB  Albania        8 097   8 986   10 058  11 123  12 246    header TRUE   library reshape   x2  lt - melt x  id   c  Code    Country    variable name    Year   x2   Year    lt - as numeric gsub  X        x2   Year

User · Answer

Three alternative solutions   1  With data table   You can use the same melt function as in the reshape2 package  which is an extended  amp  improved implementation   melt from data table has also more parameters that the melt-function from reshape2  You can for example also specify the name of the variable-column   library data table  long  lt - melt setDT wide   id vars   c  Code   Country    variable name    year     which gives     gt  long     Code     Country year  value  1   AFG Afghanistan 1950 20 249  2   ALB     Albania 1950  8 097  3   AFG Afghanistan 1951 21 352  4   ALB     Albania 1951  8 986  5   AFG Afghanistan 1952 22 532  6   ALB     Albania 1952 10 058  7   AFG Afghanistan 1953 23 557  8   ALB     Albania 1953 11 123  9   AFG Afghanistan 1954 24 555 10   ALB     Albania 1954 12 246    Some alternative notations   melt setDT wide   id vars   1 2  variable name    year   melt setDT wide   measure vars   3 7  variable name    year   melt setDT wide   measure vars   as character 1950 1954   variable name    year     2  With tidyr   library tidyr  long  lt - wide   gt   gather year  value  -c Code  Country     Some alternative notations   wide   gt   gather year  value  -Code  -Country  wide   gt   gather year  value  -1 -2  wide   gt   gather year  value  - 1 2   wide   gt   gather year  value  -1  -2  wide   gt   gather year  value  3 7  wide   gt   gather year  value   1950   1954     3  With reshape2   library reshape2  long  lt - melt wide  id vars   c  Code    Country      Some alternative notations that give the same result     you can also define the id-variables by column number melt wide  id vars   1 2     as an alternative you can also specify the measure-variables   all other variables will then be used as id-variables melt wide  measure vars   3 7  melt wide  measure vars   as character 1950 1954       NOTES    reshape2 is retired  Only changes necessary to keep it on CRAN will be made   source  If you want to exclude NA values  you can add na rm   TRUE to the melt as well as the gather functions      Another problem with the data is that the values will be read by R as character-values  as a result of the   in the numbers   You can repair that with gsub and as numeric   long value  lt - as numeric gsub          long value     Or directly with data table or dplyr     data table long  lt - melt setDT wide                id vars   c  Code   Country                 variable name    year     value    as numeric gsub          value       tidyr and dplyr long  lt - wide   gt   gather year  value  -c Code Country     gt      mutate value   as numeric gsub          value        Data   wide  lt - read table text  Code Country        1950    1951    1952    1953    1954 AFG  Afghanistan    20 249  21 352  22 532  23 557  24 555 ALB  Albania        8 097   8 986   10 058  11 123  12 246   header TRUE  check names FALSE

User · Answer

Since this answer is tagged with r-faq  I felt it would be useful to share another alternative from base R  stack   Note  however  that stack does not work with factors--it only works if is vector is TRUE  and from the documentation for is vector  we find that      is vector returns TRUE if x is a vector of the specified mode having no attributes other than names  It returns FALSE otherwise    I m using the sample data from  Jaap s answer  where the values in the year columns are factors   Here s the stack approach   cbind wide 1 2   stack lapply wide -c 1  2    as character          Code     Country values  ind    1   AFG Afghanistan 20 249 1950    2   ALB     Albania  8 097 1950    3   AFG Afghanistan 21 352 1951    4   ALB     Albania  8 986 1951    5   AFG Afghanistan 22 532 1952    6   ALB     Albania 10 058 1952    7   AFG Afghanistan 23 557 1953    8   ALB     Albania 11 123 1953    9   AFG Afghanistan 24 555 1954    10  ALB     Albania 12 246 1954

[r] Reshaping data.frame from wide to long format

Examples related to r

Examples related to dataframe

Examples related to reshape

Examples related to r-faq