Skip To Content

Clustering your data

Information is very often stored either as a number on a spectrum or as a category. Predictive modeling benefits from both of these kinds of data. One neat, often under-used trick is using clusters to help your model capture behaviors even better. Simply put, imagine that a characteristic in your dataset, like SAT scores, has a pretty straightforward relationship- the higher the score, the more likely to retain.


But there are many cases where there will be deviations from this “pretty straightforward” relationship. These are the cases where a cluster will be a big help. The linear term itself can’t do much to improve the fit. Using a series of clusters, as specified below, we could fine-tune our model. The nice part is, with the clusters formed, computers take care of this correction for us.


The other nice part is that Predict can help you take care of the clustering too, without needing to investigate cutoff points, or run the calculations yourself.

To learn more about clustering in Predict, watch this short video below!

Back to Knowledge Base