In a dataset, I want to take two attributes and create supervised scatter plot. Does anyone know how to give different color to each class ?
I am trying to use col == c("red","blue","yellow")
in the plot command but not sure if it is right as if I include one more color, that color also comes in the scatter plot even though I have only 3 classes.
Thanks
This article is old, but I spent a hot minute trying to figure this out so I figured I would post an updated response. My main source is this wonderful PowerPoint: http://www.lrdc.pitt.edu/maplelab/slides/14-Plotting.pdf. Okay, here's what I did:
In this example, my data set is called 'Data' and I was comparing 'Touch' data against 'Gaze' data. The subjects were divided into two groups: 'Red' and 'Blue'.
`plot(Data$Touch[Data$Category == "Blue"], Data$Gaze[Data$Category == "Blue"], main = "Touch v Gaze", xlab = "Gaze(s)", ylab = "Touch (s)", col = "blue", pch = 20)`
This set of code creates a scatterplot of Touch v Gaze of my Blue group
par(new = TRUE)
This tells R to create a new plot. This second plot is laid over the first automatically by R when you run all the code together
plot(Data$Touch[Data$Category == "Red"], Data$Gaze[Data$Category == "Red"], axes = FALSE, xlab = "", ylab = "", col = "red", pch = 2)
This is the second plot. I found when I was coding these that R didn't just lay over the data points onto the Blue plot, but it also lay the axes, axes titles, and main title.
To get rid of the annoying overlap problem, I used the axes function to get rid of the axes themselves and set the titles to be blank.
legend(x = 60, y = 50, legend = c("Blue", "Red"), col = c("blue", "red"), pch = c(20, 2))
Adding a pretty legend to round out the project
This way may be a bit longer than the pretty ggplots but I did not want to learn something completely new today, hope this helps someone!
Here is how I do it in 2018. Who knows, maybe an R newbie will see it one day and fall in love with ggplot2
.
library(ggplot2)
ggplot(data = iris, aes(Petal.Length, Petal.Width, color = Species)) +
geom_point() +
scale_color_manual(values = c("setosa" = "red", "versicolor" = "blue", "virginica" = "yellow"))
Here is an example that I built based on this page.
library(e1071); library(ggplot2)
mysvm <- svm(Species ~ ., iris)
Predicted <- predict(mysvm, iris)
mydf = cbind(iris, Predicted)
qplot(Petal.Length, Petal.Width, colour = Species, shape = Predicted,
data = iris)
This gives you the output. You can easily spot the misclassified species from this figure.
If you have the classes separated in a data frame or a matrix, then you can use matplot
. For example, if we have
dat<-as.data.frame(cbind(c(1,2,5,7),c(2.1,4.2,-0.5,1),c(9,3,6,2.718)))
plot.new()
plot.window(c(0,nrow(dat)),range(dat))
matplot(dat,col=c("red","blue","yellow"),pch=20)
Then you'll get a scatterplot where the first column of dat
is plotted in red, the second in blue, and the third in yellow. Of course, if you want separate x and y values for your color classes, then you can have datx
and daty
, etc.
An alternate approach would be to tack on an extra column specifying what color you want (or keeping an extra vector of colors, filling it iteratively with a for
loop and some if
branches). For example, this will get you the same plot:
dat<-as.data.frame(
cbind(c(1,2,5,7,2.1,4.2,-0.5,1,9,3,6,2.718)
,c(rep("red",4),rep("blue",4),rep("yellow",4))))
dat[,1]=as.numeric(dat[,1]) #This is necessary because
#the second column consisting of strings confuses R
#into thinking that the first column must consist of strings, too
plot(dat[,1],pch=20,col=dat[,2])
One way is to use the lattice package and xyplot():
R> DF <- data.frame(x=1:10, y=rnorm(10)+5,
+> z=sample(letters[1:3], 10, replace=TRUE))
R> DF
x y z
1 1 3.91191 c
2 2 4.57506 a
3 3 3.16771 b
4 4 5.37539 c
5 5 4.99113 c
6 6 5.41421 a
7 7 6.68071 b
8 8 5.58991 c
9 9 5.03851 a
10 10 4.59293 b
R> with(DF, xyplot(y ~ x, group=z))
By giving explicit grouping information via variable z
, you obtain different colors. You can specify colors etc, see the lattice documentation.
Because z
here is a factor variable for which we obtain the levels (== numeric indices), you can also do
R> with(DF, plot(x, y, col=z))
but that is less transparent (to me, at least :) then xyplot()
et al.
Here is a solution using traditional graphics (and Dirk's data):
> DF <- data.frame(x=1:10, y=rnorm(10)+5, z=sample(letters[1:3], 10, replace=TRUE))
> DF
x y z
1 1 6.628380 c
2 2 6.403279 b
3 3 6.708716 a
4 4 7.011677 c
5 5 6.363794 a
6 6 5.912945 b
7 7 2.996335 a
8 8 5.242786 c
9 9 4.455582 c
10 10 4.362427 a
> attach(DF); plot(x, y, col=c("red","blue","green")[z]); detach(DF)
This relies on the fact that DF$z
is a factor, so when subsetting by it, its values will be treated as integers. So the elements of the color vector will vary with z
as follows:
> c("red","blue","green")[DF$z]
[1] "green" "blue" "red" "green" "red" "blue" "red" "green" "green" "red"
You can add a legend using the legend
function:
legend(x="topright", legend = levels(DF$z), col=c("red","blue","green"), pch=1)
Assuming the class variable is z, you can use:
with(df, plot(x, y, col = z))
however, it's important that z is a factor variable, as R internally stores factors as integers.
This way, 1 is 'black', 2 is 'red', 3 is 'green, ....
Source: Stackoverflow.com