Mitigating the Mortal Danger to a Predictive ModelReading time: 3 minutes
Richard Clark recently contributed to an article hosted on the Chronicle of Higher Education. In it, he mentioned that the “‘unprecedented’ is the mortal enemy of a predictive model.” He’s not wrong, but I can’t help but ruminate on the ways we (as data consumers and practitioners) can work around that. I would claim that it is the unprecedented and unaccounted for that is the true mortal danger to a predictive model.
For the record, that means I agree with Richard Clark. I just see an underlying assumption that we have the ability to challenge.
In the higher education industry, any given predictive model has a long career. The model you build to score your earliest applications may, in time, be the same model that sees action through to the “late applicants” stage. That means there is a lot of time to gut-check the model you have in place. It’s good practice to monitor the expectation of your model against the reality of your deposit count anyway.
The primary instances in which uncertainty or unprecedented events kill your enrollment model are:
- When you don’t recognize the unprecedented occurrence until September
- When you cannot pivot your predictive modeling approach in response to it
Recognizing unprecedented patterns and events means you can account for them. Two methods that come to mind are tuning your model or using methods outside of modeling that deliver more general results.
Tuning Your Model
There’s no way around it- if the target group for your model is behaving entirely unlike any of the historical data you have, you can’t reasonably expect to use an algorithm to pinpoint your upcoming headcount (for instance). In fact, Clark mentions several outcomes which all become unreasonable targets in such a case- number of deposits, attendance of orientation events, and melt among them.
It may be the case that you can carefully remove predictors which no longer appear to be valid for your model. However, tuning your model is not always limited to predicting your outcome using different variables. Occasionally, it is a process of rethinking your outcomes as well.
The idea of an intermediary variable is really critical for a process like this. Often, we think of predicting a critical outcome using a set of predictors. But some variables can be considered as taking place between your base predictors and your outcome.
That is to say, you might not be able to predict enrollment because there are too many uncertain steps between someone’s high school GPA and their depositing with your institution. However, you might be able to predict their likelihood of establishing contact with an advisor. Advisors are immensely capable of helping students sort through uncertainty, so right now, your “new” critical outcome might be a student participating in a conversation.
It’s easy to see as a concession, because really, the institution needs to know how many students will arrive. With upheaval decreasing confidence and posing a danger to the predictive model as you know it, though, why not try to gather as much control of the situation as you can?
The other approach to salvaging your data-informed view of what’s to come need not even involve predictive modeling. I’ve gleaned in my professional experience that people conflate statistical methods as a whole under the term “predictive modeling.” I’m leaning on that to suggest that, within the statistical realm, you can still support decisions in unprecedented circumstances by leveraging descriptive statistics.
For example, I’ve worked with several enrollment professionals recently who are looking at the differences in behaviors and outcomes for students who take online versus digital courses. Looking at engagement, grades, and passage rates, you can start to see where digital course delivery will have the largest impacts. It may turn out, for instance, that seminar courses demonstrate no statistically significant difference in engagement rates, while physics courses appear to have significantly higher passage rates when delivered in-person.
Another hot topic? Completion time. Do students who supplement or who primarily take courses finish earlier, or even complete their degree on time more frequently?
In working to identify the primary characteristics that influence the successful delivery of courses online, you can start to build outward communication to both prospective students and current students who might need it. This can improve your ability to defend the viability of your programs, and also help you support the students already enrolled at your institution.
In other words…
I agree with Richard Clark. The unprecedented is a mortal danger to a predictive model.
Even so, statistics can help inform and support institutions in virtually limitless ways. When you pair contextual knowledge of the institution and its needs with the ability to arrive at objective answers, you can pave a path forward no matter how unprecedented the circumstances.
In my view, predictive modeling stands in for analysis as an endeavor, and analysis is not predicated on tidy, reliable problems. It is predicated on messy, confusing, and obscure challenges that can be made more manageable and objective with creative exploration.