I am trying to understand how to conditional replace values in a dataframe without using a loop. My data frame is structured as follows:
> df
a b est
1 11.77000 2 0
2 10.90000 3 0
3 10.32000 2 0
4 10.96000 0 0
5 9.90600 0 0
6 10.70000 0 0
7 11.43000 1 0
8 11.41000 2 0
9 10.48512 4 0
10 11.19000 0 0
and the dput
output is this:
structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7,
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2,
4, 0), est = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("a",
"b", "est"), row.names = c(NA, -10L), class = "data.frame")
What I want to do, is to check the value of b
. If b
is 0, I want to set est
to a value from a
. I understand that df$est[df$b == 0] <- 23
will set all values of est
to 23, when b==0
. What I don't understand is how to set est
to a value of a
when that condition is true. For example:
df$est[df$b == 0] <- (df$a - 5)/2.533
gives the following warning:
Warning message:
In df$est[df$b == 0] <- (df$a - 5)/2.533 :
number of items to replace is not a multiple of replacement length
Is there a way that I can pass the relevant cell, rather than vector?
The R-inferno, or the basic R-documentation will explain why using df$* is not the best approach here. From the help page for "[" :
"Indexing by [ is similar to atomic vectors and selects a list of the specified element(s). Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument. "
I recommend using the [row,col]
notation instead. Example:
Rgames: foo
x y z
[1,] 1e+00 1 0
[2,] 2e+00 2 0
[3,] 3e+00 1 0
[4,] 4e+00 2 0
[5,] 5e+00 1 0
[6,] 6e+00 2 0
[7,] 7e+00 1 0
[8,] 8e+00 2 0
[9,] 9e+00 1 0
[10,] 1e+01 2 0
Rgames: foo<-as.data.frame(foo)
Rgames: foo[foo$y==2,3]<-foo[foo$y==2,1]
Rgames: foo
x y z
1 1e+00 1 0e+00
2 2e+00 2 2e+00
3 3e+00 1 0e+00
4 4e+00 2 4e+00
5 5e+00 1 0e+00
6 6e+00 2 6e+00
7 7e+00 1 0e+00
8 8e+00 2 8e+00
9 9e+00 1 0e+00
10 1e+01 2 1e+01
Another option would be to use case_when
require(dplyr)
mutate(df, est = case_when(
b == 0 ~ (a - 5)/2.53,
TRUE ~ est
))
This solution becomes even more handy if more than 2 cases need to be distinguished, as it allows to avoid nested if_else
constructs.
Here is one approach. ifelse
is vectorized and it checks all rows for zero values of b
and replaces est
with (a - 5)/2.53
if that is the case.
df <- transform(df, est = ifelse(b == 0, (a - 5)/2.53, est))
Try data.table's :=
operator :
DT = as.data.table(df)
DT[b==0, est := (a-5)/2.533]
It's fast and short. See these linked questions for more information on :=
:
When should I use the :=
operator in data.table
Source: Stackoverflow.com