I need to replace the levels of a factor column in a dataframe. Using the iris
dataset as an example, how would I replace any cells which contain virginica
with setosa
in the Species column?
I expected the following to work, but it generates a warning message and simply inserts NAs:
iris$Species[iris$Species == 'virginica'] <- 'setosa'
This question is related to
r
Using dlpyr::mutate
and forcats::fct_recode
:
library(dplyr)
library(forcats)
iris <- iris %>%
mutate(Species = fct_recode(Species,
"Virginica" = "virginica",
"Versicolor" = "versicolor"
))
iris %>%
count(Species)
# A tibble: 3 x 2
Species n
<fctr> <int>
1 setosa 50
2 Versicolor 50
3 Virginica 50
For the things that you are suggesting you can just change the levels using the levels
:
levels(iris$Species)[3] <- 'new'
A more general solution that works with all the data frame at once and where you don't have to add new factors levels is:
data.mtx <- as.matrix(data.df)
data.mtx[which(data.mtx == "old.value.to.replace")] <- "new.value"
data.df <- as.data.frame(data.mtx)
A nice feature of this code is that you can assign as many values as you have in your original data frame at once, not only one "new.value"
, and the new values can be random values. Thus you can create a complete new random data frame with the same size as the original.
I had the same problem. This worked better:
Identify which level you want to modify: levels(iris$Species)
"setosa" "versicolor" "virginica"
So, setosa
is the first.
Then, write this:
levels(iris$Species)[1] <-"new name"
You want to replace the values in a dataset column, but you're getting an error like this:
invalid factor level, NA generated
Try this instead:
levels(dataframe$column)[levels(dataframe$column)=='old_value'] <- 'new_value'
You can use the function revalue
from the package plyr
to replace values in a factor vector.
In your example to replace the factor virginica
by setosa
:
data(iris)
library(plyr)
revalue(iris$Species, c("virginica" = "setosa")) -> iris$Species
In case you have to replace multiple values and if you don't mind "refactoring" your variable with as.factor(as.character(...)) you could try the following:
replace.values <- function(search, replace, x){
stopifnot(length(search) == length(replace))
xnew <- replace[ match(x, search) ]
takeOld <- is.na(xnew) & !is.na(x)
xnew[takeOld] <- x[takeOld]
return(xnew)
}
iris$Species <- as.factor(search=c("oldValue1","oldValue2"),
replace=c("newValue1","newValue2"),
x=as.character(iris$Species))
Source: Stackoverflow.com