Skip To Content

Holiday Deep Dive Recap & Reflections

Reading time: 10 minutes
holiday deep dive recap and reflections

To wrap up 2020 on a lighthearted (yet festive) note, our support team recently presented a Deep Dive like no other! The session’s topics were centered around Holiday-related data analyses and allowed each member of the support team to showcase their own mini-presentation. While the presentations themselves were festive & fun, they also showcased a technique or product feature that might be lesser known amongst our users. 

In this blog post, we’ll recap the presentations they shared, the process they went through during their analyses, and which product features they intended to highlight.

 

James 2020
James Cousins, Analyst Manager

Topic: 🎮 Video Game Sales

 

The dataset & why I chose it:

My analysis concerned video game sales. I play a fair amount of video games in my free time, so I thought it would be neat to see which video games sold the 

holiday deep dive recap and reflections chart showing video games in construct

most copies, or if there was some interesting pattern explaining sales. The data I was able to find only contained the calendar year when the game was sold, and gross sales broken down by geographical regions. That meant I couldn’t test if holiday-released games sold better, but I could do some New Years-inspired exploration. I found out, for instance, that the “Call of Duty” franchise has a strong history of topping the charts for both Playstation and XBOX consoles! 

Strategy & product features showcased:

The main feature that I showed off during the presentation was the scripting support. When I was looking for suitable data, I ultimately ended up at Kaggle. In the process of manually downloading an extract of the data, I noticed there was an API. Further research indicated that the API could be called using a Python script.  So instead of downloading the data manually, I started to set up a script to streamline the process. Outside of fun projects like this, there are dozens of reasons you might want to scrape the web for data, and that’s why I wanted to use a script in my job.

The Kaggle user hosting the dataset mentioned that they originally used a web-scraping script to generate the data they posted. I skipped that step and downloaded the ready-built dataset, but my “present” to Construct users this Christmas is the reminder that you can absolutely incorporate your own scripts within Construct jobs.


Jeffrey Collis, Data Analyst

Topic: 🎶 Holiday Music


The dataset & why I chose it:

Anyone up for a dance-off webinar? I analyzed music data, but more specifically, I analyzed the 50 top played tracks in 20 different countries on Spotify during December 2019. I chose this dataset because music is always infused in my day-to-day routines. I use music throughout my workday to drive my analyses– I use music to transition from work to preparing dinner, which almost always involves me moving to a beat. While I usually like to follow the story being told in a song, I have never been averse to just feeling a beat and allowing my body to get into the experience. So, I wanted to know which countries were listening to positive tracks, and who was dancing!

Strategy & product features showcased:

I did not get especially technical in my process. While I had a download of 1000 songs with 17 fields each, I wanted to make my Construct job easily shareable. So, I pasted the CSV contents into the Embed Data node. I had learned ahead that while expected values like Title, Artist, and Length exist, there is also Valence or Positivity, and Danceability, as well as Liveness, Energy, etc. 

Having noticed repeating song titles, my first foray into the data was to identify the tracks that charted in multiple countries. I used the Aggregate Node, aggregating by Title and Artist, counting distinct Country, and I got to learn about music I didn’t already have in my personal playlists.

I then used the Aggregate Node again, aggregating by Country, computing the Max value of Danceability which did not realize much usable information. Apparently, every country has at least one song you can jam to, so the Max by country didn’t tell me much. However, computing the Mean Danceability by Country taught me that Latin American countries own the lead. 

Jon MacMillan Headshot Rapid Insight 2020
Jon MacMillan, Product Manager

Topic: 🥧 Holiday Recipes


The dataset & why I chose it:

I decided to analyze some Christmas recipes data, not for my love of sweets, but because the data presented quite a challenge. The data was only available in JSON, which can be a difficult file format to work with when trying to analyze data in terms of columns and rows. The difficulty with JSON files is that they tend to have nested values, so a single “column” can contain multiple values. In this particular file, the ingredients happened to be nested with the “Enchanted Forest Christmas Cake” requiring 37 different ingredients. So, first, I needed to be able to convert this file into a readable format for Construct. I then decided to look at what ingredients are most common, both in terms of mentions as well as the physical amount. However, that required some pretty elaborate transformations:

Strategy & product features showcased:

Once I was able to convert the JSON file into a readable text format, the next step was analyzing the data. I found that there were many references to the same ingredients through the ingredients list, however they were rarely identical. If I wanted to know how many times olive oil was used in all the recipes, it proved to be a hard question to answer, since I was comparing values like ‘2 tbsp olive oil’ to ‘1 tsp olive oil’.

I had to use a series of cleanse operations and transforms in order to create a more unified value. A lesser known feature in the Cleanse node is within the modify tab. In this tab, you’re able to perform tasks like sub-string replacements. This feature helped me remove any numbers from the list (which avoided any issues with differences between ingredient amounts).

Once the amounts were removed from the ingredients, I went through a series of steps using a transform node to remove other unnecessary common text, such as abbreviations for measurement amounts (tbsp, tsp, g). To simplify the steps and keep them clear in my head (and in Construct), I used local variables and comments in my transform node so that the functions were easier to follow.

In the end, my analysis was not groundbreaking. Turns out butter and flour are the most common ingredients based on mentions and amounts, respectively. While the end result isn’t all that exciting, it was fun to nerd out on the steps that I had to take to get there. 

 

Alex Ziko Data Analysts Rapid Insight Headshot
Alex Ziko, Customer Success Manager & Data Analyst

Topic: ✈️ Holiday Travel


The dataset & why I chose it:

Christmas is a time for seeing friends and family and spreading the holiday cheer. However, not all holiday travel can be merry and bright. My analysis focused on the downside of the Christmas hustle and bustle – claims filed to the TSA. Starting with a dataset of fifteen years of TSA claims, I used Construct to prep the dataset, which contained variables like: Airport name, Airline used, Claim Type, Incident Date, Claim amount, and Claim site. Using Predict, I was able to see what the chances are of TSA ending up on your personal naughty or nice list. 

Strategy & product features showcased:

I used a general approach to wrangling the data to set up my statistical analysis. The first step was to think of a few questions that I was interested in. Here’s what I came up with:

  1. Are there a lot of TSA claims filed during the Christmas holiday travel period? 
  2. Are there any airports that have higher than average claims filed by their patrons?

Step one was to decide on a cohort and visually inspect the data. With fifteen years worth of claims to work with, I decided that the previous three years would be simple and manageable. After removing all other records, I began working on visually inspecting the cohort. 

holiday deep dive recap and reflections visually inspecting cohort in predict

A few data issues jumped out to me: 

  1. There were two “Airport Name” columns, and some were missing records.
  1. “Item Category” (the items listed in the claim) were entered into a single cell, delimited by a semicolon, and sometimes contained over 10 separate items. 

To solve issue 1 (multiple airport name columns), I used a Transform Operation IF(A is null, B, A), where A and B were the two airport name columns. This returned a result of selecting the latter if the former was empty. After this was done, almost all the airports had a value associated with them. Issue 1 solved!

To solve issue 2 (item category), I decided to use a CONTAINS function to keyword select items that visually appeared more than others after skimming the dataset.

holiday deep dive recap and reflections contains function in predict

This is a useful function, and I hope more of our customers use it if they don’t already! It’s a little confusing to look at, but this is a function specific to text fields. The CONTAINS function is essentially a keyword search. Here, I created a flag whenever ‘Audio/Video’, ‘Cameras’, ‘Personal Electronics’, or ‘Computer & Accessories’ were identified as the lost item.

I repeated this and generated four new columns of binary flags for keywords relating to: Electronics, Luggage and Clothing, Sports Equipment, and Jewelry and Watches. See complete transform operations below:


After addressing the 2 issues mentioned above, the final piece of prep work I wanted to do was determine what constituted a “Christmas travel” day. I chose to identify a Christmas travel day as the two days before and the two days after Christmas: December 23rd, 24th, 25th, 26th, and 27th.  I used a basic REMOVERIGHT function to take the claim incident date, remove the year, and forward slash. So, for example, 12/25/2017 became 12/25. Once that was complete, I used another CONTAINS function to flag truncated dates if they were December 23rd through 27th. By doing this, I was able to easily compare multiple years.

holiday deep dive recap and reflections remove right function in predict

The REMOVERIGHT() function in the Transform node of Construct.

The dataset with the newly added column.

With all that being done, I was now ready to statistically measure the frequency and concentration of TSA claims during the Christmas travel season.

Right away, my initial question was answered. According to my data, less than 1% of TSA claims incidents happen between December 23rd through December 27th.  Great news, however, I expected the frequency to be much higher!

holiday deep dive recap and reflections christmas travel day records in predict

The next question I wanted to answer was: are there any airlines/airports that have higher than average claims filed by their patrons? The answer to this question got a little tricky, since my analysis was focused on a cohort that made up less than 1% of the total sample. However, with that in mind, I was able to find that there were a few airlines and airports that had statistically more TSA claims filed. A few airports that do worse than the average are LaGuardia, Newark International, and Phoenix Sky Harbor International.

The biggest takeaway from this TSA claims analysis was that the difference between these airports and the others is not that large. So, although these locations do see higher rates of TSA claims, I wouldn’t say that the number of claims filed at them was outrageously high. 

I think it’s safe to say that air travel comes with some risks– and losing luggage or having an item be damaged is the (rare) price you pay for air travel. This year maybe we can be thankful that less traveling means less of a chance of filling out paperwork for the TSA. I just hope that Santa doesn’t lose any of my gifts this year. 

Lily 2020
Lily Brennan, Data Analyst

Topic: 🎈 Celebrations by Nation

The dataset & why I chose it:

The analysis I shared was centered around holidays celebrated around the world. I chose it because after the past year, I think everyone around the world could use an excuse to celebrate! I am always counting down the days until the next holiday (and yes I am that person in the office who keeps the official Christmas countdown on the whiteboard). With this in mind, I started wondering what everyone around the world was celebrating, and more specifically– how often are they celebrating? 

Strategy & product features showcased:

To answer these questions I used two built-in functions of the Transform node.

The first was the Lag function, which allows a formula to use the value of the record above the active record. This lets you compare values across rows rather than across columns. In this case, I wanted to compare the date of the holiday in the record above the active record to the date of the holiday in the active record. This allowed me to find the number of days in between those two holidays.  

This leads me to the second function of the Transform node I used, which was the DaysBetween function. This function does just what it sounds like, and outputs the number of days in between two dates. From there, I could use the Aggregate node and find out the max and min of the days between and find a count of which country has the most holidays.

In the end, Israel had the most holidays with 23 per year and the US landed somewhere in the middle with 10 recognized holidays. When it came to days between holidays, I’ll be staying away from Australia. At one point in the year, they need to wait 244 days between holidays: Anzac Day, which falls on April 25, and Christmas Day, which is on December 25! So, for now, I think I’ll stay in the US (and far from Australia) and celebrate! 

Until 2021…

That’s a wrap! We hope this year’s Holiday Deep Dive webinar was as enjoyable for you to watch as it was for us to make.

Have any additional questions or comments to share? Drop them in the comments below!

 

 

 

 

Stay up to date
Subscribe to our blog

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments