If data.frame columns are different types, apply()
has a problem.
A subtlety about row iteration is how apply(a.data.frame, 1, ...)
does
implicit type conversion to character types when columns are different types;
eg. a factor and numeric column. Here's an example, using a factor
in one column to modify a numeric column:
mean.height = list(BOY=69.5, GIRL=64.0)
subjects = data.frame(gender = factor(c("BOY", "GIRL", "GIRL", "BOY"))
, height = c(71.0, 59.3, 62.1, 62.1))
apply(height, 1, function(x) x[2] - mean.height[[x[1]]])
The subtraction fails because the columns are converted to character types.
One fix is to back-convert the second column to a number:
apply(subjects, 1, function(x) as.numeric(x[2]) - mean.height[[x[1]]])
But the conversions can be avoided by keeping the columns separate
and using mapply()
:
mapply(function(x,y) y - mean.height[[x]], subjects$gender, subjects$height)
mapply()
is needed because [[ ]]
does not accept a vector argument. So the column
iteration could be done before the subtraction by passing a vector to []
,
by a bit more ugly code:
subjects$height - unlist(mean.height[subjects$gender])