6 Common Mistakes in Predictive Modeling for Business AnalysisReading time: 3 minutes
Business data analysis professionals take note: you’ll want to avoid these predictive modeling mistakes. During the predictive modeling process, there are many steps where it’s easy to make errors. We’ve compiled this list of the most common slip-ups and oversights so you can avoid them in your own analyses.
Failing to consider enough variables
When deciding which variables to include in a model, include anything you have on hand that could possibly be predictive, then pare variables back until you discover which are most relevant. Modern business analytics and predictive modeling tools weed out the extra variables for you, so you won’t need to be afraid to throw the kitchen sink at it on your first pass.
Not adding unique additional variables
Any guide-list of variables should be used as just that: a guide. It should be enriched by other variables that may be unique to your organization. If there are few unique variables to be had, consider creating some to augment your dataset. Try adding new fields like “distance from retail location” or creating riffs and derivations of variables you already have.
Selecting the wrong Y-variable
When building your dataset for a logistic regression model, you’ll want to select the response with the smaller number of data points as your y-variable. A great example of this is building a customer retention model. In most cases, you’ll actually want to model attrition, identifying those customers who are likely to leave (hopefully the smaller group!) rather than those who are likely to stay.
Not enough Y-variable responses
Along with ensuring that your model population is large enough (1,000 records minimum) and spans enough time (3 years is good), you’ll want to make sure that there are enough Y-variable responses to model. Generally, you’ll want to shoot for at least 100 instances of the response you’d like to model.
Building a model on the wrong population
To borrow an example from the world of fundraising, a model built to predict future giving will look a lot different for someone with a donor history than someone who has never donated before. Consider which population you’d eventually like to use the model to score and build the model tailored to that population, or consider building two models, one for each sub-group.
Judging the quality of a model using one measure
It’s difficult to capture the quality of a model in a single number. That’s why modeling outputs provide so many model fit measures. Beyond the numbers, graphic outputs like decile analysis and lift analysis can provide visual insight into how well the model fits your data and what the gains from using a model are likely to be.
If you’re not sure which model measures to focus on, ask around. Do you know someone building models similar to yours? Ask which measures they rely on and what ranges they shoot for. There’s an abundance of information available on model outputs. Consider multiple gauges before deciding whether your model is worth moving forward with.
Reliable business analysis and predictive modeling tools certainly give you an upper hand in heading these issues off. Rapid Insight offers business data analysts the tools they need to support and enhance their business intelligence and analytics efforts. Read more on Rapid Insight for Business Analysts here.