Pseudo R squared measures for non-linear models

Confusion surrounds pseudo R squared measures, this post alleviates that confusion

May 24, 2024

The R² statistic, or the coefficient of determination, is a statistical measure used to assess the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, it provides an estimate of model fit. The purpose of R² becomes slightly harder to understand in a non-linear context. Instead of stating plainly that 56 per cent of the variance can be explained through this analytical model, a non-linear R²statistic provides a more general guideline of model fit. In a non-linear context the R²becomes the pseudo R². This post seeks to outline the most common and provide relevant critiques for each.

There are several pseudo-R² statistics to choose from, but no one appears to have a consensus on which is best or most appropriate to use (Allison, 2013). Previous empirical work on the different measures of R² statistics has demonstrated that for the same model, different measures produce wildly different pseudo R² (Smith and McKenna, 2013). Four common pseudo R² that are used are McFadden’s R² and adjusted R² (McFadden, 1972), the Nagelkerke R²(Nagelkerke, 1991), as well as the Cox-Snell R²(Cox and Snell, 1989), amongst others. For a linear model, the R² statistic represents the proportion of variance in the dependent variable that can be explained by the independent variables in an ordinary least squares regression model. An R² of 0.4 in this regard would represent 40 per cent of the variance being explained. is defined as:

For non-parametric models, the R² becomes slightly more difficult to interpret; for logistic-based regression, the estimator is maximising the likelihood function. There is no ‘true’ measure of R² in a non-linear model. However, the proportion of unaccounted-for variance that is reduced by adding variables to the model is the same as the proportion of variance accounted for or R². All four pseudo R² statistics use this general logic to construct their variations of R². The interpretation of a pseudo-R² differs from its linear regression counterpart due to the limits placed upon a logistic or multinominal pseudo-based measure. Whilst the pseudo-R² shares with the R² the rule that the limit tends to increase as the absolute value of beta increases with other parameters that are fixed. There is a difference in the proportion that these limits increase by, with pseudo-R² measures increasing at a lower rate than linear counterparts, even when the associations are strong (Hu, Shao and Palta, 2006). Four pseudo R² are presented below.

McFadden’s R² is defined as:

Where L₀ is the value of the likelihood function for a model with zero predictors and L_M is the likelihood of the model being estimated. The L₀ is analogous to the residual sum of squares in an OLS regression - analogous to SS_Res. McFadden’s R² is the simplest non-linear statistic on offer as all it uses the the log-likelihood function. The major critique of the McFadden R²is that each additional parameter added to a given non-linear model will increase the R²statistic.

To deal with this critique, McFadden’s adjusted R²penalises additional parameters. It is defined as:

Where K is the number of estimated parameters in the model. The adjusted version of McFadden’s R² penalises the R² as more parameters are added to the model, making it an attractive option to use.

The Cox-Snell R² (also known as the maximum likelihood ) offers an alternative calculation of pseudo-R² and is calculated as:

Where n is the sample size and -G_M represents the negative likelihood ratio chi-square statistic and N the total number of observations. The Cox-Snell R² can be calculated for both linear or non-linear models – the equation is identical. This makes it an attractive measure if there is a desire for uniformity across analyses. As Allison states, this R² is more appropriately termed a ‘generalised’ rather than ‘pseudo’ R² because the usual R² used in linear regression depends on the likelihoods for the models without predictors by this formula (Allison, 2013). The Cox-Snell R² is very attractive as it is consistent with linear R² measures, is consistent with maximum likelihood as an estimation method, is asymptotically independent of the sample size n, and has an interpretation of explained variation (Nagelkerke, 1991).

The major issue with the Cox-Snell R² however is that it has an upper bound of less than 1.0 and is dependent on the margin proportion of cases within events – this means that the upper bound of a given model can be a lot less than 1.0 or very close to it, depending on the marginal proportion of cases within events. This makes the Cox-Snell R² much less attractive than first thought.

A solution to this presented by Nagelkerke, that is to divide the R² by its upper bound. The Nagelkerke R² (also known as the Craig and Uhler R²) is defined as:

However, this ‘solution’ is ad hoc. This R² also tends to obtain the highest R² statistic out of all pseudo methods.

Each of these pseudo R² measures present certain issues. Following the advice from Allsion (2013) the Tjur R² measure, or as Tjur calls it, the coefficient of discrimination (Tjur, 2009) appears to be the best R² measure for use of interpretation in logistic regression models. The Tjur R² measure is defined as:

It is the difference between the average fitted probability for the binary outcome coded to 1 (success level) and the average fitted probability for the binary outcome coded to 0 (the failure level).

The Tjur R² measure has an upper bound limit of 1.0 and is very similar to the linear R² estimation, as it is calculated for each category of the dependent variable, calculated the mean of the predicted probabilities of an event, then take the difference between the two means. The Tjur R² is equal to the arithmetic mean of two R² formulas based on squared residuals and equal to the geometric mean of two other R² formulas based on square residuals (Allison, 2013). Whilst there is no automatic output for this R² measure in Stata, it can be accomplished by after running a regression, running the predict command on an e(sample) and then getting the difference in means from a ttest.

The Tjur R² is not linked to the likelihood function and as a result adding additional variables to the model could result in a decline in the overall . This is a benefit rather than a detriment to the measure. This allows for a better comparison of predictive potential for model building. A major issue with the Tjur R² is that it can’t be readily applied to an ordinal or multinominal logistic regression.

Whilst no single pseudo R²measure appears perfect, two seem to have the least criticisms. The McFadden Adjusted R²and the Tjur R² offer the most practical solution for a non-linear pseudo R² measure without the baggage that appears to occupy other non-linear measures. I would advice using both when possible when conducting non-linear based measures. Ultimately, the practicality in most software packages means that providing all R²measures mentioned here would take no time at all. All except the Tjur R² measure can be produced using the ‘fitstat’ command in Stata, and the former measure can be produced with a predict and ttest. All measures should be included within the analysis - though McFadden’s adjusted R2 or Tjur’s R2 measure should be primarily reported.

Note: This post was partly inspired by Allison’s wonderful series on R²measures. The point of this post was to update and provide my own views on the matter.

References:

Allison, P., 2013. What’s the best R-squared for logistic regression. Statistical Horizons, 13.

Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data. 2nd Edition, Chapman and Hall/CRC, London.

Hu, B., Shao, J. and Palta, M., 2006. Pseudo-R 2 in logistic regression model. Statistica Sinica, pp.847-860.

McFadden, D., 1972. Conditional logit analysis of qualitative choice behavior.

Nagelkerke, N.J., 1991. A note on a general definition of the coefficient of determination. biometrika, 78(3), pp.691-692.

Smith, T.J. and McKenna, C.M., 2013. A comparison of logistic regression pseudo R2 indices. Multiple Linear Regression Viewpoints, 39(2), pp.17-26.

Tjur, T., 2009. Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. The American Statistician, 63(4), pp.366-372.

That Sociologist

Discussion about this post