The Aggregation of Marginal Data Prep Gains
senior statistical analyst
Data measures what we do on a day-to-day basis so it’s no surprise that strategies and habits that help us in our day-to-day can also help us with our data. There are a vast number of ways you can make your life easier by adopting little habits and optimizing the way you do certain things. Some of those same “life hacks” help you with data cleanup, reporting, and analysis, too!
There is a relatively well-known concept referred to as “the aggregation of marginal gains.” If you haven’t heard the phrase, you can probably still recall the principle. It refers to the incredible ways that tiny improvements along a complete cycle of events can amount to an outsized improvement in the final stage of your process. Most famously, this has worked for sports teams that reviewed and improved the minor elements of athletes’ travel and training routines. But until eSports grows to include competitive data cleanup, I want to apply this concept to our professional data-driven lives.
However many steps you take with your data, if you can make a step more automatic or systematic you can improve that step of the process. For example, you may schedule a process to start automatically instead of requiring your direct attention or set up an early alert to detect mismatches between records. These simple, small improvements for a single step stack on top of the next step that you streamline, and you end up with an aggregation of small performance increases. And typically, that is big news.
Imagine you’re gathering data, cleaning it, and then compiling a report. This might be exactly what you do on a daily basis. You likely manage each of these steps just fine but consider the increases in the output that the following small changes to your process could create.
Chances are you may deal with multiple sources when gathering data. You may also need to manually initiate the data pull. Finally, it’s possible that you need to do this from multiple sources. Some small changes you can make here are scheduling the data pulls and establishing a single process that kicks off multiple data pulls. If you can make either one of those two changes, gathering the data that begins your work becomes a faster and more consistent process.
How do you know what needs to happen to your data? Maybe you remember all the little hiccups in your source data that need addressing each time, such as recoding variables, imputing missing fields, or even removing columns with duplicate information. Ideally, you have documentation of what needs to happen. These days there are countless ways that you can create processes that will automate these steps. Automating the steps involved in cleaning your data not only saves you time but also reduces the risk for error.
Compiling a Report
A report can directly impact how successful your data prep has been by communicating your results. But it takes time to put the aggregations and the visualizations together. If you are populating charts or cross-tabs manually, imagine how much easier the process would be if you could just refresh the data? If you are already linking your reports to data, you could more quickly analyze your data if you make the refresh a routine process!
Scheduling data pulls, establishing a single process, automating data cleansing steps, and refreshing report data are just a few ways you can maximize your output and create efficiencies with data prep. If you’re spending less time to achieve the same results, you can expand your analysis further. Process consistency will lead to higher confidence in results, too.
We’d love to hear how you’ve “hacked” your data preparation process in the comments below. If you want to learn more about how we help customers save time, increase output, and improve consistency, watch this on-demand video of a webinar I recently presented.
Senior Statistical Analyst
On-demand Video: Expert Tips for Better Data Prep
In this on-demand video, Senior Statistical Analyst James Cousins discusses some of the most common data prep projects. He explores ways you can make your data tasks more reliable, accurate, and faster.