[r] Subset and ggplot2

I have a problem to plot a subset of a data frame with ggplot2. My df is like:

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

How can I now plot Value1 vs Value2 only for IDs 'P1' and 'P3'? For example I tried:

ggplot(subset(df,ID=="P1 & P3") +
  geom_line(aes(Value1, Value2, group=ID, colour=ID)))

but I always receive an error.

This question is related to r ggplot2 subset

The answer is


Your formulation is almost correct. You want:

subset(dat, ID=="P1" | ID=="P3") 

Where the | ('pipe') means 'or'. Your solution, ID=="P1 & P3", is looking for a case where ID is literally "P1 & P3"


Try filter to subset only the rows of P1 and P3

df2 <- filter(df, ID == "P1" | ID == "P3")

Than yo can plot Value1. vs Value2.


With option 2 in @agstudy's answer now deprecated, defining data with a function can be handy.

library(plyr)
ggplot(data=dat) + 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
            data=function(x){x$ID %in% c("P1", "P3"))

This approach comes in handy if you wish to reuse a dataset in the same plot, e.g. you don't want to specify a new column in the data.frame, or you want to explicitly plot one dataset in a layer above the other.:

library(plyr)
ggplot(data=dat, aes(Value1, Value2, group=ID, colour=ID)) + 
  geom_line(data=function(x){x[!x$ID %in% c("P1", "P3"), ]}, alpha=0.5) +
  geom_line(data=function(x){x[x$ID %in% c("P1", "P3"), ]})

Are you looking for the following plot:

library(ggplot2) 
l<-df[df$ID %in% c("P1","P3"),]
myplot<-ggplot(l)+geom_line(aes(Value1, Value2, group=ID, colour=ID))

enter image description here


@agstudy's answer didn't work for me with the latest version of ggplot2, but this did, using maggritr pipes:

ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                data = . %>% filter(ID %in% c("P1" , "P3")))

It works because if geom_line sees that data is a function, it will call that function with the inherited version of data and use the output of that function as data.


Similar to @nicolaskruchten s answer you could do the following:

require(ggplot2)

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

ggplot(df) + 
  geom_line(data = ~.x[.x$ID %in% c("P1" , "P3"), ],
            aes(Value1, Value2, group = ID, colour = ID))

There's another solution that I find useful, especially when I want to plot multiple subsets of the same object:

myplot<-ggplot(df)+geom_line(aes(Value1, Value2, group=ID, colour=ID))
myplot %+% subset(df, ID %in% c("P1","P3"))
myplot %+% subset(df, ID %in% c("P2"))

Use subset within ggplot

ggplot(data = subset(df, ID == "P1" | ID == "P2") +
   aes(Value1, Value2, group=ID, colour=ID) +
   geom_line()

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to ggplot2

Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph Saving a high resolution image in R Change bar plot colour in geom_bar with ggplot2 in r Remove legend ggplot 2.2 Remove all of x axis labels in ggplot Changing fonts in ggplot2 Explain ggplot2 warning: "Removed k rows containing missing values" Error: package or namespace load failed for ggplot2 and for data.table In R, dealing with Error: ggplot2 doesn't know how to deal with data of class numeric

Examples related to subset

how to remove multiple columns in r dataframe? Using grep to help subset a data frame in R Subset a dataframe by multiple factor levels creating a new list with subset of list using index in python subsetting a Python DataFrame Undefined columns selected when subsetting data frame How to select some rows with specific rownames from a dataframe? Subset data to contain only columns whose names match a condition find all subsets that sum to a particular value Subset and ggplot2