[r] Plotting two variables as lines using ggplot2 on the same graph

A very newbish question, but say I have data like this:

test_data <-
  data.frame(
    var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
    var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
    date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
  )

How can I plot both time series var0 and var1 on the same graph, with date on the x-axis, using ggplot2? Bonus points if you make var0 and var1 different colours, and can include a legend!

I'm sure this is very simple, but I can't find any examples out there.

This question is related to r ggplot2 graph time-series r-faq

The answer is


You need the data to be in "tall" format instead of "wide" for ggplot2. "wide" means having an observation per row with each variable as a different column (like you have now). You need to convert it to a "tall" format where you have a column that tells you the name of the variable and another column that tells you the value of the variable. The process of passing from wide to tall is usually called "melting". You can use tidyr::gather to melt your data frame:

library(ggplot2)
library(tidyr)

test_data <-
  data.frame(
    var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
    var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
    date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
  )
test_data %>%
    gather(key,value, var0, var1) %>%
    ggplot(aes(x=date, y=value, colour=key)) +
    geom_line()

multiple series ggplot2

Just to be clear the data that ggplot is consuming after piping it via gather looks like this:

date        key     value
2002-01-01  var0    100.00000
2002-02-01  var0    115.16388 
...
2007-11-01  var1    114.86302
2007-12-01  var1    119.30996

I am also new to R but trying to understand how ggplot works I think I get another way to do it. I just share probably not as a complete perfect solution but to add some different points of view.

I know ggplot is made to work with dataframes better but maybe it can be also sometimes useful to know that you can directly plot two vectors without using a dataframe.

Loading data. Original date vector length is 100 while var0 and var1 have length 50 so I only plot the available data (first 50 dates).

var0 <- 100 + c(0, cumsum(runif(49, -20, 20)))
var1 <- 150 + c(0, cumsum(runif(49, -10, 10)))
date <- seq(as.Date("2002-01-01"), by="1 month", length.out=50)    

Plotting

ggplot() + geom_line(aes(x=date,y=var0),color='red') + 
           geom_line(aes(x=date,y=var1),color='blue') + 
           ylab('Values')+xlab('date')

enter image description here

However I was not able to add a correct legend using this format. Does anyone know how?


Using your data:

test_data <- data.frame(
var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
Dates = seq.Date(as.Date("2002-01-01"), by="1 month", length.out=100))

I create a stacked version which is what ggplot() would like to work with:

stacked <- with(test_data,
                data.frame(value = c(var0, var1),
                           variable = factor(rep(c("Var0","Var1"),
                                                 each = NROW(test_data))),
                           Dates = rep(Dates, 2)))

In this case producing stacked was quite easy as we only had to do a couple of manipulations, but reshape() and the reshape and reshape2 might be useful if you have a more complex real data set to manipulate.

Once the data are in this stacked form, it only requires a simple ggplot() call to produce the plot you wanted with all the extras (one reason why higher-level plotting packages like lattice and ggplot2 are so useful):

require(ggplot2)
p <- ggplot(stacked, aes(Dates, value, colour = variable))
p + geom_line()

I'll leave it to you to tidy up the axis labels, legend title etc.

HTH


The general approach is to convert the data to long format (using melt() from package reshape or reshape2) or gather()/pivot_longer() from the tidyr package:

library("reshape2")
library("ggplot2")

test_data_long <- melt(test_data, id="date")  # convert to long format

ggplot(data=test_data_long,
       aes(x=date, y=value, colour=variable)) +
       geom_line()

ggplot2 output

Also see this question on reshaping data from wide to long.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to ggplot2

Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph Saving a high resolution image in R Change bar plot colour in geom_bar with ggplot2 in r Remove legend ggplot 2.2 Remove all of x axis labels in ggplot Changing fonts in ggplot2 Explain ggplot2 warning: "Removed k rows containing missing values" Error: package or namespace load failed for ggplot2 and for data.table In R, dealing with Error: ggplot2 doesn't know how to deal with data of class numeric

Examples related to graph

How to plot multiple functions on the same figure, in Matplotlib? Python equivalent to 'hold on' in Matlab How to combine 2 plots (ggplot) into one plot? how to draw directed graphs using networkx in python? What is the difference between dynamic programming and greedy approach? Plotting using a CSV file Python equivalent of D3.js Count number of times a date occurs and make a graph out of it How do I create a chart with multiple series using different X values for each series? Rotating x axis labels in R for barplot

Examples related to time-series

How to convert dataframe into time series? Peak signal detection in realtime timeseries data Pandas: rolling mean by time interval Excel plot time series frequency with continuous xaxis How to calculate rolling / moving average using NumPy / SciPy? Plotting two variables as lines using ggplot2 on the same graph

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc