pull out p-values and r-squared from a linear regression

Question

How do you pull out the p-value  for the significance of the coefficient of the single explanatory variable being non-zero  and R-squared value from a simple linear regression model  For example     x   cumsum c 0  runif 100  -1   1    y   cumsum c 0  runif 100  -1   1    fit   lm y   x  summary fit    I know that summary fit  displays the p-value and R-squared value  but I want to be able to stick these into other variables

User · Answer

Use     summary fit   coefficients    num    4    where num is a number which denotes the row of the coefficients matrix  It will depend on how many features you have in your model and which one you want to pull out the p-value for  For example  if you have only one variable there will be one p-value for the intercept which will be  1 4  and the next one for your actual variable which will be  2 4   So your num will be 2

User · Answer

Notice that summary fit  generates an object with all the information you need  The beta  se  t and p vectors are stored in it  Get the p-values by selecting the 4th column of the coefficients matrix  stored in the summary object     summary fit  coefficients  4   summary fit  r squared   Try str summary fit   to see all the info that this object contains    Edit  I had misread Chase s answer which basically tells you how to get to what I give here

User · Answer

While both of the answers above are good  the procedure for extracting parts of objects is more general   In many cases  functions return lists  and the individual components can be accessed using str   which will print the components along with their names  You can then access them using the   operator  i e  myobject componentname    In the case of lm objects  there are a number of predefined methods one can use such as coef    resid    summary   etc  but you won t always be so lucky

User · Answer

You can see the structure of the object returned by summary   by calling str summary fit    Each piece can be accessed using    The p-value for the F statistic is more easily had from the object returned by anova   Concisely  you can do this   rSquared  lt - summary fit  r squared pVal  lt - anova fit   Pr  gt F   1

User · Answer

Another option is to use the cor test function  instead of lm    gt  x  lt - c 44 4  45 9  41 9  53 3  44 7  44 1  50 7  45 2  60 1   gt  y  lt - c  2 6   3 1   2 5   5 0   3 6   4 0   5 2   2 8   3 8    gt  mycor   cor test x y   gt  mylm   lm x y     r and rsquared   gt  cor test x y  estimate    2       cor  0 3262484   gt  summary lm x y   r squared  1  0 3262484    P value    gt  lmp lm x y      Using the lmp function defined in Chase s answer  1  0 1081731  gt  cor test x y  p value  1  0 1081731

User · Answer

I used this lmp function quite a lot of times    And at one point I decided to add new features to enhance data analysis  I am not in expert in R or statistics but people are usually looking at different information of a linear regression      p-value a and b  r   and of course the aspect of the point distribution   Let s have an example  You have here  Here a reproducible example with different variables   Ex lt -structure list X1   c -36 8598  -37 1726  -36 4343  -36 8644   -37 0599  -34 8818  -31 9907  -37 8304  -34 3367  -31 2984  -33 5731    X2   c 64 26  63 085  66 36  61 08  61 57  65 04  72 69  63 83   67 555  76 06  68 61   Y1   c 493 81544  493 81544  494 54173   494 61364  494 61381  494 38717  494 64122  493 73265  494 04246   494 92989  494 98384   Y2   c 489 704166  489 704166  490 710962   490 653212  490 710612  489 822928  488 160904  489 747776  490 600579   488 946738  490 398958   Y3   c -19L  -19L  -19L  -23L  -30L   -43L  -43L  -2L  -58L  -47L  -61L     Names   c  X1    X2    Y1     Y2    Y3    row names   c NA  11L   class    data frame     library reshape2  library ggplot2  Ex2 lt -melt Ex id c  X1   X2    colnames Ex2  3 4  lt -c  Y   Yvalue   Ex3 lt -melt Ex2 id c  Y   Yvalue    colnames Ex3  3 4  lt -c  X   Xvalue    ggplot Ex3 aes Xvalue Yvalue              geom smooth method  lm  alpha 0 2 size 1 color  grey              geom point size 2             facet grid Y X scales  free      Use the lmp function  lmp  lt - function  modelobject      if  class modelobject      lm   stop  Not an object of class  lm       f  lt - summary modelobject  fstatistic     p  lt - pf f 1  f 2  f 3  lower tail F      attributes p   lt - NULL     return p           create function to extract different informations from lm  lmtable lt -function  var1 var2 data signi NULL      var1  y data   colnames of data as character  so  Y1  or c  Y1   Y2   for example    var2  x data   colnames of data as character  so  X1  or c  X1   X2   for example    data  data in dataframe  variables in columns     if signi TRUE  round p-value with 2 digits and add     if  lt 0 001     if  lt  0 01    if  lt  0 05     if  class data      data frame   stop  Not an object of class  data frame       Tabtemp lt -data frame matrix NA ncol 6 nrow length var1  length var2      for  i in 1 length var2              Tabtemp   length var1  i - length var1 -1    length var1  i  1  lt -var1   Tabtemp   length var1  i - length var1 -1    length var1  i  2  lt -var2 i    colnames Tabtemp  lt -c  Var y   Var x   p-value   a   b   r 2      for  n in 1 length var1         Tabtemp    length var1  i - length var1 -1   n-1  3  lt -lmp lm data  var1 n   data  var2 i   data      Tabtemp    length var1  i - length var1 -1   n-1  4  lt -coef lm data  var1 n   data  var2 i   data   1     Tabtemp    length var1  i - length var1 -1   n-1  5  lt -coef lm data  var1 n   data  var2 i   data   2     Tabtemp    length var1  i - length var1 -1   n-1  6  lt -summary lm data  var1 n   data  var2 i   data   r squared            signi2 lt -data frame matrix NA ncol 3 nrow nrow Tabtemp      signi2  1  lt -ifelse Tabtemp  3  lt 0 001 paste0        ifelse Tabtemp  3  lt 0 01 paste0       ifelse Tabtemp  3  lt 0 05 paste0      paste0          signi2  2  lt -round Tabtemp  3  2    signi2  3  lt -paste0 format signi2  2  digits 2  signi2  1      for  l in 1 nrow Tabtemp           Tabtemp  p-value  l  lt -ifelse is null signi            Tabtemp  p-value  l            ifelse isTRUE signi                   paste0 signi2  3  l                    Tabtemp  p-value  l            Tabtemp      ------- EXAMPLES ------  lmtable  Y1   X1  Ex  lmtable c  Y1   Y2   Y3   c  X1   X2   Ex  lmtable c  Y1   Y2   Y3   c  X1   X2   Ex signi TRUE    There is certainly a faster solution than this function but it works

User · Answer

This is the easiest way to pull the p-values   coef summary modelname      Pr  gt  t

User · Answer

I came across this question while exploring suggested solutions for a similar problem  I presume that for future reference it may be worthwhile to update the available list of answer with a solution utilising the broom package   Sample code  x   cumsum c 0  runif 100  -1   1    y   cumsum c 0  runif 100  -1   1    fit   lm y   x  require broom  glance fit    Results   gt  gt  glance fit    r squared adj r squared    sigma statistic    p value df    logLik      AIC      BIC deviance df residual 1 0 5442762     0 5396729 1 502943  118 2368 1 3719e-18  2 -183 4527 372 9055 380 7508 223 6251          99     Side notes  I find the glance function is useful as it neatly summarises the key values  The results are stored as a data frame which makes further manipulation easy    gt  gt  class glance fit    1   data frame

User · Answer

Extension of  Vincent  s answer   For lm   generated models   summary fit  coefficients  4      P-values  summary fit  r squared            R squared values   For gls   generated models   summary fit  tTable  4            P-values   R-squared values are not generated b c gls uses max-likelihood not Sums of Squares   To isolate an individual p-value itself  you d add a row number to the code    For example to access the p-value of the intercept in both model summaries   summary fit  coefficients 1 4  summary fit  tTable 1 4       Note  you can replace the column number with the column name in each of the above instances   summary fit  coefficients 1  Pr  gt  t        lm  summary fit  tTable 1  p-value             gls     If you re still unsure of how to access a value form the summary table use str   to figure out the structure of the summary table   str summary fit

User · Answer

x   cumsum c 0  runif 100  -1   1    y   cumsum c 0  runif 100  -1   1    fit   lm y   x   gt  names summary fit    1   call            terms            3   residuals       coefficients     5   aliased         sigma            7   df              r squared        9   adj r squared   fstatistic      11   cov unscaled       summary fit  r squared

User · Answer

r-squared  You can return the r-squared value directly from the summary object summary fit  r squared  See names summary fit   for a list of all the items you can extract directly   Model p-value  If you want to obtain the p-value of the overall regression model   this blog post outlines a function to return the p-value   lmp  lt - function  modelobject        if  class modelobject      lm   stop  Not an object of class  lm         f  lt - summary modelobject  fstatistic     p  lt - pf f 1  f 2  f 3  lower tail F      attributes p   lt - NULL     return p      gt  lmp fit   1  1 622665e-05   In the case of a simple regression with one predictor  the model p-value and the p-value for the coefficient will be the same    Coefficient p-values  If you have more than one predictor  then the above will return the model p-value  and the p-value for coefficients can be extracted using   summary fit  coefficients  4      Alternatively  you can grab the p-value of coefficients from the anova fit  object in a similar fashion to the summary object above

User · Answer

For the final p-value displayed at the end of summary    the function uses pf   to calculate from the summary fit  fstatistic values   fstat  lt - summary fit  fstatistic pf fstat 1   fstat 2   fstat 3   lower tail FALSE    Source   1    2

[r] pull out p-values and r-squared from a linear regression

Examples related to r