What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least squares regression

Question

Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is   I am doing a single-variate regression analysis as follows    v lm  lt - lm epm   n days  data v   print summary v lm     Results   Call  lm formula   epm   n days  data   v   Residuals      Min      1Q  Median      3Q     Max  -693 59 -325 79   53 34  302 46  964 95   Coefficients              Estimate Std  Error t value Pr  gt  t        Intercept   2550 39      92 15  27 677    lt 2e-16     n days        -13 12       5 39  -2 433   0 0216     --- Signif  codes   0           0 001          0 01         0 05         0 1         1   Residual standard error  410 1 on 28 degrees of freedom Multiple R-squared  0 1746      Adjusted R-squared  0 1451  F-statistic  5 921 on 1 and 28 DF   p-value  0 0216

User · Answer

The R-squared is not dependent on the number of variables in the model. The adjusted R-squared is.

The adjusted R-squared adds a penalty for adding variables to the model that are uncorrelated with the variable your trying to explain. You can use it to test if a variable is relevant to the thing your trying to explain.

Adjusted R-squared is R-squared with some divisions added to make it dependent on the number of variables in the model.

User · Answer

Note that  in addition to number of predictive variables  the Adjusted R-squared formula above also adjusts for sample size   A small sample will give a deceptively large R-squared   Ping Yin  amp  Xitao Fan  J  of Experimental Education 69 2   203-224   Estimating R-squared shrinkage in multiple regression   compares different methods for adjusting r-squared and concludes that the commonly-used ones quoted above are not good   They recommend the Olkin  amp  Pratt formula   However  I ve seen some indication that population size has a much larger effect than any of these formulas indicate   I am not convinced that any of these formulas are good enough to allow you to compare regressions done with very different sample sizes  e g   2 000 vs  200 000 samples  the standard formulas would make almost no sample-size-based adjustment    I would do some cross-validation to check the r-squared on each sample

User · Answer

The  quot adjustment quot  in adjusted R-squared is related to the number of variables and the number of observations  If you keep adding variables  predictors  to your model  R-squared will improve - that is  the predictors will appear to explain the variance - but some of that improvement may be due to chance alone   So adjusted R-squared tries to correct for this  by taking into account the ratio  N-1   N-k-1  where N   number of observations and k   number of variables  predictors   It s probably not a concern in your case  since you have a single variate  Some references   How high  R-squared  Goodness of fit statistics Multiple regression Re  What is  quot Adjusted R 2 quot  in Multiple Regression

User · Answer

The Adjusted R-squared is close to  but different from  the value of R2  Instead of being based on the explained sum of squares SSR and the total sum of squares SSY  it is based on the overall variance  a quantity we do not typically calculate   s2T   SSY  n - 1  and the error variance MSE  from the ANOVA table  and is worked out like this  adjusted R-squared    s2T - MSE    s2T    This approach provides a better basis for judging the improvement in a fit due to adding an explanatory variable  but it does not have the simple summarizing interpretation that R2 has   If I haven t made a mistake  you should verify the values of adjusted R-squared and R-squared as follows   s2T  lt - sum anova v lm   2      sum anova v lm   1    MSE  lt - anova v lm   3   2  adj R2  lt -  s2T - MSE    s2T   On the other side  R2 is  SSR SSY  where SSR   SSY - SSE  attach v  SSE  lt - deviance v lm    or SSE  lt - sum  epm - predict v lm list n days    2  SSY  lt - deviance lm epm   1     or SSY  lt - sum  epm-mean epm   2  SSR  lt -  SSY - SSE    or SSR  lt - sum  predict v lm list n days   - mean epm   2  R2  lt - SSR   SSY

[r] What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least squares regression?

Examples related to r

Examples related to statistics

Examples related to regression