Brushing Up on R-SquaredReading time: 2 minutes
R-squared… it rings a bell… but what exactly is it, again?
For many of us, it’s been a few years (at least) since we took a statistics course. When talking to customers about some of the statistical concepts that factor into predictive models, I’ve found that while many topics are “kind of familiar”, most take some explanation or revisiting. To help revisit the topics that are relevant to modeling, we’ve created a new Brown Bag Learning Series to explore statistical topics one at a time in bite-sized (15 minute) segments.
The first Brown Bag, given by Mike Laracy, focused on the R-squared statistic, aka the Coefficient of Determination. So, what is R-squared? And how is it helpful? Here’s a quick crash course:
R-squared values can range from zero to one. An R-squared value of zero means that the model is not explaining any of the variance in Y. An R-squared value of one means that the model is perfectly explaining all of the variance in Y. So, in general, the closer R-squared is to one, the better the model is describing the inputted data.
The R-squared statistics tells you how well your model is fitting your data. R-squared tells you how the variation in your X-variables (predictor variables) explains a defined amount of the variation in your Y-values (predicted values) based on a given model.
In its most basic form, the equation explains variance divided by total variance, where “explained variance” means “explained by the model”. For a visual explanation of explained and unexplained variance, see the graphic below. Alternatively, express the total explained variance using the unexplained variance. Write R-squared as one minus the unexplained variance divided by the total variance, as shown below:
More scientifically: R2= 1 – SSE/CTSS, where SSE= the Sum of Squared Errors from the model, and CTSS = the Corrected Total Sum of Squares.
Wait, there’s more!
For more information about R-squared and a deeper definition of how R-squared is calculated, check out the recording of our first Brown Bag session, “A Crash Course on R2”.
PS: Upcoming Brown Bag sessions include “What the Heck is Multicollinearity?”, “Outliers and Their Impact on a Predictive Model”, and “Hypothesis Testing and Variable Significance”. Besides the Brown Bag series, we have lots of other education-based events coming up; you find them all here.
Any tips of your own? Leave them in the comments below!