[r] How to give color to each class in scatter plot in R?

In a dataset, I want to take two attributes and create supervised scatter plot. Does anyone know how to give different color to each class ?

I am trying to use col == c("red","blue","yellow") in the plot command but not sure if it is right as if I include one more color, that color also comes in the scatter plot even though I have only 3 classes.

Thanks

This question is related to r plot

The answer is


This article is old, but I spent a hot minute trying to figure this out so I figured I would post an updated response. My main source is this wonderful PowerPoint: http://www.lrdc.pitt.edu/maplelab/slides/14-Plotting.pdf. Okay, here's what I did:

In this example, my data set is called 'Data' and I was comparing 'Touch' data against 'Gaze' data. The subjects were divided into two groups: 'Red' and 'Blue'.

`plot(Data$Touch[Data$Category == "Blue"], Data$Gaze[Data$Category == "Blue"], main = "Touch v Gaze", xlab = "Gaze(s)", ylab = "Touch (s)", col = "blue", pch = 20)`
  • This set of code creates a scatterplot of Touch v Gaze of my Blue group

    par(new = TRUE)

  • This tells R to create a new plot. This second plot is laid over the first automatically by R when you run all the code together

    plot(Data$Touch[Data$Category == "Red"], Data$Gaze[Data$Category == "Red"], axes = FALSE, xlab = "", ylab = "", col = "red", pch = 2)

  • This is the second plot. I found when I was coding these that R didn't just lay over the data points onto the Blue plot, but it also lay the axes, axes titles, and main title.

  • To get rid of the annoying overlap problem, I used the axes function to get rid of the axes themselves and set the titles to be blank.

    legend(x = 60, y = 50, legend = c("Blue", "Red"), col = c("blue", "red"), pch = c(20, 2))

  • Adding a pretty legend to round out the project

This way may be a bit longer than the pretty ggplots but I did not want to learn something completely new today, hope this helps someone!


Here is how I do it in 2018. Who knows, maybe an R newbie will see it one day and fall in love with ggplot2.

library(ggplot2)

ggplot(data = iris, aes(Petal.Length, Petal.Width, color = Species)) +
  geom_point() +
  scale_color_manual(values = c("setosa" = "red", "versicolor" = "blue", "virginica" = "yellow"))

Here is an example that I built based on this page.

library(e1071); library(ggplot2)

mysvm      <- svm(Species ~ ., iris)
Predicted  <- predict(mysvm, iris)

mydf = cbind(iris, Predicted)
qplot(Petal.Length, Petal.Width, colour = Species, shape = Predicted, 
   data = iris)

This gives you the output. You can easily spot the misclassified species from this figure.

enter image description here


If you have the classes separated in a data frame or a matrix, then you can use matplot. For example, if we have

dat<-as.data.frame(cbind(c(1,2,5,7),c(2.1,4.2,-0.5,1),c(9,3,6,2.718)))

plot.new()
plot.window(c(0,nrow(dat)),range(dat))
matplot(dat,col=c("red","blue","yellow"),pch=20)

Then you'll get a scatterplot where the first column of dat is plotted in red, the second in blue, and the third in yellow. Of course, if you want separate x and y values for your color classes, then you can have datx and daty, etc.

An alternate approach would be to tack on an extra column specifying what color you want (or keeping an extra vector of colors, filling it iteratively with a for loop and some if branches). For example, this will get you the same plot:

dat<-as.data.frame(
    cbind(c(1,2,5,7,2.1,4.2,-0.5,1,9,3,6,2.718)
    ,c(rep("red",4),rep("blue",4),rep("yellow",4))))

dat[,1]=as.numeric(dat[,1]) #This is necessary because
                            #the second column consisting of strings confuses R
                            #into thinking that the first column must consist of strings, too
plot(dat[,1],pch=20,col=dat[,2])

One way is to use the lattice package and xyplot():

R> DF <- data.frame(x=1:10, y=rnorm(10)+5, 
+>                  z=sample(letters[1:3], 10, replace=TRUE))
R> DF
    x       y z
1   1 3.91191 c
2   2 4.57506 a
3   3 3.16771 b
4   4 5.37539 c
5   5 4.99113 c
6   6 5.41421 a
7   7 6.68071 b
8   8 5.58991 c
9   9 5.03851 a
10 10 4.59293 b
R> with(DF, xyplot(y ~ x, group=z))

By giving explicit grouping information via variable z, you obtain different colors. You can specify colors etc, see the lattice documentation.

Because z here is a factor variable for which we obtain the levels (== numeric indices), you can also do

R> with(DF, plot(x, y, col=z))

but that is less transparent (to me, at least :) then xyplot() et al.


Here is a solution using traditional graphics (and Dirk's data):

> DF <- data.frame(x=1:10, y=rnorm(10)+5, z=sample(letters[1:3], 10, replace=TRUE)) 
> DF
    x        y z
1   1 6.628380 c
2   2 6.403279 b
3   3 6.708716 a
4   4 7.011677 c
5   5 6.363794 a
6   6 5.912945 b
7   7 2.996335 a
8   8 5.242786 c
9   9 4.455582 c
10 10 4.362427 a
> attach(DF); plot(x, y, col=c("red","blue","green")[z]); detach(DF)

This relies on the fact that DF$z is a factor, so when subsetting by it, its values will be treated as integers. So the elements of the color vector will vary with z as follows:

> c("red","blue","green")[DF$z]
 [1] "green" "blue"  "red"   "green" "red"   "blue"  "red"   "green" "green" "red"    

You can add a legend using the legend function:

legend(x="topright", legend = levels(DF$z), col=c("red","blue","green"), pch=1)

Assuming the class variable is z, you can use:

with(df, plot(x, y, col = z))

however, it's important that z is a factor variable, as R internally stores factors as integers.

This way, 1 is 'black', 2 is 'red', 3 is 'green, ....