“Treble” Variables – a Humorous Look at Music and Predictive Modeling to Start the New YearReading time: 4 minutes
Editor’s Note: As we begin 2015, we thought we would start with this humorous look at variables used in predictive modeling, courtesy of our data analyst, Jon MacMillan. Jon is well known around the office for his love of music, predictive modeling and New England dry humor. Here is his list of treble variables where he combines those topics together…
Part of my job is making bad puns for blog articles and the rest of the time I am working with customers who are using our software. A lot of the questions they have revolve around the predictive models that they are building. From those interactions I have put together what could be considered both the greatest and worst musical lineup of treble (aka “trouble”) variables and their equivalent musical artist.
U2 – “Forced Variables”
If you haven’t heard of the Apple-U2 debacle it’s probably because your BlackBerry is still frozen. Either way, the firestorm that Apple received after forcing the new U2 album upon its customers is similar to the firestorm that you will receive when you force variables into a model. Most often these are financial aid variables forced into enrollment and retention models; ‘FAFSA choice position’, ‘Estimated Family Contribution’, etc. These variables are frequently statistically significant and great variables to include, but if not you have to accept it and move on.
Radiohead – “Variables Ahead of Their Time”
Love them or hate them, Radiohead is a band ahead of their time. These are our anachronistic variables. These are our biggest offenders when it comes to predictive modeling. The typical example of an anachronistic variable is including a ‘Deposit Flag’ when predicting enrollment. Obviously, those that deposit are going to enroll and probably already have. However, other variables that we forget about are time sensitive ones. If you include something like ‘Campus Visits’ in your enrollment model, you must take into account a date cutoff for those visits. Let’s say that you want to score your current applicant pool on December 1st. If you are using a campus visits variable then you should only flag those that have visited prior to December 1st. This obviously requires different snapshots in time for different scoring intervals, but Rapid Insight software makes this an easily attainable feat.
Taylor Swift – “Guilty Pleasure Variables”
Let me publicly state that I am not a fan of Taylor Swift. As a 28 year-old male, I don’t find much to relate to in her music, but that does not mean that I don’t rock out in my car to “Shake it off”. It is probably a good thing I am not a fan anyway as I use Spotify. What are these guilty pleasure variables I speak of? Things like ‘Term’, ‘Term code’, ‘Student ID’, etc. These are things that you want in your dataset for reference, but you don’t want the model to see, or use. I always like to include term in my modeling files, so that I can look historically at trends in the data, but you need to remember to exclude these variables from the analysis. If you have higher enrollment rates in Fall 2012, then you might see that included in a model, but since it is not ever going to be Fall 2012 again (as much as you want it to be) that isn’t going to help you moving forward.
Kanye West – “The Remixed Variable”
I am not a huge Kanye West fan, but he does make a mean remix. There are variables used often in Higher Ed, which on their own offer little predictive value, but when remixed, or transformed turn out to be highly predictive. Our classic example is changing ‘Zip Code’ to a ‘Distance From Campus’ variable, which is easily done within Construct. Another example here would be to add up all scholarship money offered to each student and instead of just using the sum, converting that to the % of tuition. Your tuition is most likely increasing and you might also be offering more money. This new variable will account for those changes and help to identify the actual relationship between scholarships awarded and enrollment.
Musical Acts You Regret Ever Liking – “The Worst Variables”
In my first draft of this post I was going to pick on a certain band that will now be unnamed so as not to offend anyone as to my opinion of “The Worst”. Instead, we did an unscientific poll of bands that I, along with some of my coworkers once liked but now regret, much like you will regret using the below variables in a predictive model.
The bands some of us regret liking? Two votes for The Monkees (which produced shock and anger from other office mates who still love them), one for Sisqo, one for the Hanson Brothers (not to be confused with hockey greats, The Hanson Brothers), one for Bobby Sherman and one for the Barenaked Ladies.
The variables everyone would regret using and should never be employed in a predictive model include Student ID, Student Name, SSN, etc. Although we have this information available and are inclined to include it in our dataset, but just like some of our original music purchases, these variables should be thrown out.
Here’s to a successful year of predictive modeling in 2015! If you are just getting started with predictive modeling or if you just want to learn more about what goes on under the hood, check out our new online predictive modeling education series videos. They are all free to watch, kind of like Radiohead’s In Rainbows album back in 2009. We cover the concepts of modeling, along with detailed explanations of the process.