Error in contrasts when defining a linear model in R

Question

When I try to define my linear model in R as follows   lm1  lt - lm predictorvariable   x1 x2 x3  data dataframe df    I get the following error message   Error in  contrasts lt -    tmp    value   contr funs 1   isOF nn       contrasts can be applied only to factors with 2 or more levels    Is there any way to ignore this or fix it  Some of the variables are factors and some are not

User · Answer

It appears that at least one of your predictors ,x1, x2, or x3, has only one factor level and hence is a constant.

Have a look at

lapply(dataframe.df[c("x1", "x2", "x3")], unique)

to find the different values.

User · Answer

From my experience ten minutes ago this situation can happen where there are more than one category but with a lot of NAs  Taking the Kaggle Houseprice Dataset as example  if you loaded data and run a simple regression   train df   read csv  train csv   lm1   lm SalePrice      data   train df    you will get same error  I also tried testing the number of levels of each factor  but none of them says it has less than 2 levels   cols   colnames train df  for  col in cols     if is factor train df  col          cat col    has    length levels train df  col        n           So after a long time I used summary train df  to see details of each col  and removed some  and it finally worked   train df   subset train df  select -c Id  PoolQC Fence  MiscFeature  Alley  Utilities   lm1   lm SalePrice      data   train df    and removing any one of them the regression fails to run again with same error  which I have tested myself     And above attributes generally have 1400  NAs and 10 useful values  so you might want to remove these garbage attributes  even they have 3 or 4 levels  I guess a function counting how many NAs in each column will help

User · Answer

If the error happens to be because your data has NAs  then you need to set the glm   function options of how you would like to treat the NA cases  More information on this is found in a relevant post here  https   stats stackexchange com questions 46692 how-the-na-values-are-treated-in-glm-in-r

User · Answer

This error message may also happen when the data contains NAs   In this case  the behaviour depends on the defaults  see documentation   and maybe all cases with NA s in the columns mentioned in the variables are silently dropped   So it may be that a factor does indeed have several outcomes  but the factor only has one outcome when restricting to the cases without NA s   In this case  to fix the error  either change the model  remove the problematic factor from the formula   or change the data  i e  complete the cases

User · Answer

This is a variation to the answer provided by  Metrics and edited by  Max Ghenis      l  lt - sapply iris  function x  is factor x   m  lt - iris  l   n  lt - sapply  m  function x    y  lt - summary x  length x  len  lt - length y y lt 0 005   y gt 0 995   cbind len t y       drop cols df  lt - data frame var   names l l                                status   ifelse as vector t n 1      0  NODROP   DROP                                level1   as vector t n 2                                 level2   as vector t n 3        Here  after identifying factor variables  the second sapply computes what percent of records belong to each level   category of the variable  Then it identifies number of levels over 99 5  or below 0 5  incidence rate  my arbitrary thresholds     It then goes on to return the number of valid levels and the incidence rate of each level in each categorical variable    Variables with zero levels crossing the thresholds should not be dropped  while the other should be dropped from the linear model    The last data frame makes viewing the results easy  It s hard coded for this data set since all factor variables are binomial  This data frame can be made generic easily enough

User · Answer

If your independent variable  RHS variable  is a factor or a character taking only one value then that type of error occurs     Example  iris data in R   model1  lt - lm Sepal Length   Sepal Width   Species  data iris      Call    lm formula   Sepal Length   Sepal Width   Species  data   iris     Coefficients           Intercept         Sepal Width  Speciesversicolor   Speciesvirginica                2 2514             0 8036             1 4587             1 9468     Now  if your data consists of only one species     model1  lt - lm Sepal Length   Sepal Width   Species                data iris iris Species     setosa         Error in  contrasts lt -    tmp    value   contr funs 1   isOF nn           contrasts can be applied only to factors with 2 or more levels   If the variable is numeric  Sepal Width  but taking only a single value say 3  then the model runs but you will get NA as coefficient of that variable as follows    model2  lt -lm Sepal Length   Sepal Width   Species               data iris iris Sepal Width    3         Call    lm formula   Sepal Length   Sepal Width   Species        data   iris iris Sepal Width    3        Coefficients           Intercept         Sepal Width  Speciesversicolor   Speciesvirginica                 4 700                 NA              1 250              2 017   Solution  There is not enough variation in dependent variable with only one value  So  you need to drop that variable  irrespective of whether that is numeric or character or factor variable     Updated as per comments  Since you know that the error will only occur with factor character  you can focus only on those and see whether the length of levels of those factor variables is 1  DROP  or greater than 1  NODROP     To see  whether the variable is a factor or not  use the following code     l  lt - sapply iris  function x  is factor x      Sepal Length  Sepal Width Petal Length  Petal Width      Species           FALSE        FALSE        FALSE        FALSE         TRUE    Then you can get the data frame of factor variables only  m  lt - iris   l    Now  find the number of levels of factor variables  if this is one you need to drop that   ifelse n  lt - sapply m  function x  length levels x       1   DROP    NODROP     Note  If the levels of factor variable is only one then that is the variable  you have to drop

User · Answer

The answers by the other authors have already addressed the problem of factors with only one level or NAs   Today  I stumbled upon the same error when using the rstatix  anova test   function but my factors were okay  more than one level  no NAs  no character vectors        Instead  I could fix the error by dropping all variables in the dataframe that are not included in the model  I don t know what s the reason for this behavior but just knowing about this might also be helpful when encountering this error

User · Answer

Metrics and Svens answer deals with the usual situation but for us who work in non-english enviroments if you have exotic characters            in your character variable you will get the same result  even if you have multiple factor levels    Levels  lt - c  Pri    F  r   gives the contrast error  while Levels  lt - c  Pri    For   doesn t  This is probably a bug

[database] Error in contrasts when defining a linear model in R

Examples related to database

Examples related to r

Examples related to statistics