Skip To Content

I reloaded my data into Predict and saw different variables enter the model – Why?

Predict has a feature that allows the use of random samples while building models. This means you can use some of the data to build a model, and the rest of the data to validate the model. The basis for this random sample is determined when you load the data into an analysis. Reloading the dataset over again means you are very likely to be seeing a different sample of your overall dataset. This directly influences the model that you build, but only if you are using a holdout sample while building the model (which is the tool’s default behavior). Most of the time you will see the same variables, even if the coefficients change slightly. If you see some variables swap in and out entirely, it mostly likely means that two or more variables compete closely with each other to explain the outcome. Any random sample the model uses provides evidence relating some characteristics to your outcome, but if the random sample changes (when you reload the data), the evidence available may also change, producing a different model.

 This is nothing to be worried about. If you build a model without a holdout sample it will never vary unless the data is updated. We recommend keeping track of which variables swap in and out in cases like these. Then you can see why the two compete so closely, and potentially decide which of the two you’d prefer to have in the model. Also, keep in mind that a model without a holdout sample will always find the same relationships, so you can avoid any uncertainty by eliminating the holdout sample – within modeling options.

Back to Knowledge Base