[r] Replace a value in a data frame based on a conditional (`if`) statement

In the R data frame coded for below, I would like to replace all of the times that B appears with b.

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12])
colnames(junk) <- c("nm", "val")

this provides:

   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   f
7   C   g
8   D   h
9   A   i
10  B   j
11  C   k
12  D   l

My initial attempt was to use a for and if statements like so:

for(i in junk$nm) if(i %in% "B") junk$nm <- "b"

but as I am sure you can see, this replaces ALL of the values of junk$nm with b. I can see why this is doing this but I can't seem to get it to replace only those cases of junk$nm where the original value was B.

NOTE: I managed to solve the problem with gsub but in the interest of learning R I still would like to know how to get my original approach to work (if it is possible)

This question is related to r recode

The answer is


stata.replace<-function(data,replacevar,replacevalue,ifs) {
  ifs=parse(text=ifs)
  yy=as.numeric(eval(ifs,data,parent.frame()))
  x=sum(yy)
  data=cbind(data,yy)
  data[yy==1,replacevar]=replacevalue
  message=noquote(paste0(x, " replacement are made"))
  print(message)
  return(data[,1:(ncol(data)-1)])
}

Call this function using below line.

d=stata.replace(d,"under20",1,"age<20")

another useful way to replace values

library(plyr)
junk$nm <- revalue(junk$nm, c("B"="b"))

Short answer is:

junk$nm[junk$nm %in% "B"] <- "b"

Take a look at Index vectors in R Introduction (if you don't read it yet).


EDIT. As noticed in comments this solution works for character vectors so fail on your data.

For factor best way is to change level:

levels(junk$nm)[levels(junk$nm)=="B"] <- "b"

If you are working with character variables (note that stringsAsFactors is false here) you can use replace:

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12], stringsAsFactors = FALSE)
colnames(junk) <- c("nm", "val")

junk$nm <- replace(junk$nm, junk$nm == "B", "b")
junk
#    nm val
# 1   A   a
# 2   b   b
# 3   C   c
# 4   D   d
# ...

You have created a factor variable in nm so you either need to avoid doing so or add an additional level to the factor attributes. You should also avoid using <- in the arguments to data.frame()

Option 1:

junk <- data.frame(x = rep(LETTERS[1:4], 3), y =letters[1:12], stringsAsFactors=FALSE)
junk$nm[junk$nm == "B"] <- "b"

Option 2:

levels(junk$nm) <- c(levels(junk$nm), "b")
junk$nm[junk$nm == "B"] <- "b"
junk

The easiest way to do this in one command is to use which command and also need not to change the factors into character by doing this:

junk$nm[which(junk$nm=="B")]<-"b"

As the data you show are factors, it complicates things a little bit. @diliop's Answer approaches the problem by converting to nm to a character variable. To get back to the original factors a further step is required.

An alternative is to manipulate the levels of the factor in place.

> lev <- with(junk, levels(nm))
> lev[lev == "B"] <- "b"
> junk2 <- within(junk, levels(nm) <- lev)
> junk2
   nm val
1   A   a
2   b   b
3   C   c
4   D   d
5   A   e
6   b   f
7   C   g
8   D   h
9   A   i
10  b   j
11  C   k
12  D   l

That is quite simple and I often forget that there is a replacement function for levels().

Edit: As noted by @Seth in the comments, this can be done in a one-liner, without loss of clarity:

within(junk, levels(nm)[levels(nm) == "B"] <- "b")