[r] Error - replacement has [x] rows, data has [y]

I have a numeric column ("value") in a dataframe ("df"), and I would like to generate a new column ("valueBin") based on "value." I have the following conditional code to define df$valueBin:

df$valueBin[which(df$value<=250)] <- "<=250"
df$valueBin[which(df$value>250 & df$value<=500)] <- "250-500"
df$valueBin[which(df$value>500 & df$value<=1000)] <- "500-1,000"
df$valueBin[which(df$value>1000 & df$value<=2000)] <- "1,000 - 2,000"
df$valueBin[which(df$value>2000)] <- ">2,000"

I'm getting the following error:

"Error in $<-.data.frame(*tmp*, "valueBin", value = c(NA, NA, NA, : replacement has 6530 rows, data has 6532"

Every element of df$value should fit into one of my which() statements. There are no missing values in df$value. Although even if I run just the first conditional statement (<=250), I get the exact same error, with "...replacement has 6530 rows..." although there are way fewer than 6530 records with value<=250, and value is never NA.

This SO link notes a similar error when using aggregate() was a bug, but it recommends installing the version of R I have. Plus the bug report says its fixed. R aggregate error: "replacement has <foo> rows, data has <bar>"

This SO link seems more related to my issue, and the issue here was an issue with his/her conditional logic that caused fewer elements of the replacement array to be generated. I guess that must be my issue as well, and figured at first I must have a "<=" instead of an "<" or vice versa, but after checking I'm pretty sure they're all correct to cover every value of "value" without overlaps. R error in '[<-.data.frame'... replacement has # items, need #

This question is related to r dataframe

The answer is


You could use cut

 df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf), 
    labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))

data

 set.seed(24)
 df <- data.frame(value= sample(0:2500, 100, replace=TRUE))

The answer by @akrun certainly does the trick. For future googlers who want to understand why, here is an explanation...

The new variable needs to be created first.

The variable "valueBin" needs to be already in the df in order for the conditional assignment to work. Essentially, the syntax of the code is correct. Just add one line in front of the code chuck to create this name --

df$newVariableName <- NA

Then you continue with whatever conditional assignment rules you have, like

df$newVariableName[which(df$oldVariableName<=250)] <- "<=250"

I blame whoever wrote that package's error message... The debugging was made especially confusing by that error message. It is irrelevant information that you have two arrays in the df with different lengths. No. Simply create the new column first. For more details, consult this post https://www.r-bloggers.com/translating-weird-r-errors/


TL;DR ...and late to the party, but that short explanation might help future googlers..

In general that error message means that the replacement doesn't fit into the corresponding column of the dataframe.

A minimal example:

df <- data.frame(a = 1:2); df$a <- 1:3

throws the error

Error in $<-.data.frame(*tmp*, a, value = 1:3) : replacement has 3 rows, data has 2

which is clear, because the vector a of df has 2 entries (rows) whilst the vector we try to replace it has 3 entries (rows).