Q&A Followup: Support Staff Reading time: 7 minutes
During our recent Support Staff Office Hours session, Jon Macmillan and James Cousins from the Analyst Support Team fielded user questions and walked through various approaches to problem solving with Construct’s Transform Node.
We received lots of questions; too many to answer all of them during the webinar itself. In this post, Jon and James answer the questions we missed!
Q: Is the Pivot function available in Construct?
Jon: Yes, with the Transpose node and the Reverse Transpose.
Transpose is one of the more complicated nodes. I recommend experimenting with some smaller datasets when learning how to use it, as well as checking out our deep dives that were specifically focused on the transpose nodes:
Q: What types of variables in Construct are best for use in predictive models?
James: The variables selected for your specific model will vary, so we assembled several lists of suggested variables for a variety of applications:
Q: Does Construct offer an easy way to export all jobs in a workspace to a folder for coworkers to access?
Jon: Yes. Here are a couple of techniques:
- Highlight all jobs in the workspace and drag them to the folder that you would like to export them to. Then drag and drop these jobs from the folder back into Construct to import them.
- Use a Construct backup. If you go to File > Database > Backup, this will copy all jobs and connections into a backup file that can be restored on another copy of Construct.
It is a good practice to periodically backup your jobs, even if you don’t plan to move them.
Q: What is the Call Job node?
James: The “Call Job” node is a way for your current job to execute a separate workflow without needing to manually open and run it.
With job “A” open, you can use a Call Job node within job “A” to automatically execute job “B” before your report is created.
Mechanically, the Call Job eliminates the need for you to open the second job, run that job, and then return to the first job.
Notably, the “Call Job” node allows you to pass in parameters, meaning you can execute a job for the specific group or time period that your current job requires.
For a more conversational walkthrough, visit this page.
Q: I built a Construct job to sample students for COVID testing. However, we we want to exclude students sampled the previous week from the next week’s sample. I planned to flag those students in a merge and exclude them, but is there an easier way that you would do this?
Jon: I would probably do this the way you describe. Use a merge to filter students out, then use a sample node to take a random sample of the remaining population.
Another option is to use the Transform node’s RANDOM() function to assign a random value to students. You could assign this value and then set a filter based on whatever percentage of the population you need. The random function will return a value of 0-1. Since it should be relatively evenly distributed, if you were to set a filter for values <= 0.2, you should be left with 20% of your population, give or take.
One thing to keep in mind: the random value will be recalculated each time the job runs, so it would be best to create a separate file with static random values. That way they are assigned a value once and it never changes.
Q: Will a model improve its accuracy if we feed actual outcomes data back in? If so, what outcomes data helps improve the model?
James: As long as the information from the previous model is applied as a factor in a future model, this could be helpful.
A typical approach is “phased modeling”. Let’s use a student retention model as an example. The first “phase” of the model would use only application data. Later, with first semester data available, you could build a second phase model that would incorporate the newly available fields.
You could certainly explore including a first phase score as a factor for the second phase model. This might give the model the ability to identify students who did well despite predicted risk, which is an otherwise difficult-to-identify pattern!
Q: Can you walk through how to run a SQL script directly from Construct? Also, can we pass variables into the code so that it will be dynamic when it runs each day?
Jon: I suggest using the Query node. It allows you to copy and paste or create the query directly in the node. This means if you have existing SQL queries you can copy and paste them into the Construct to run right in your jobs.
To use variables in the query, use parameters. First, create a parameter in the job. Then, within the query node, select the parameter button to insert these values. This allows you to run dynamic queries that can be changed on the fly.
Q: Can you provide an overview of how to set up email notifications?
Jon: There are several types of email notifications that can be set up.
First, set up your SMTP configuration, which is available in Tools > Options > SMTP Information. Also, remember that if your login information changes, you’ll need to update it here. Use the Test SMTP option to ensure all information is up to date.
Once the SMTP information is entered, you can use the distribute report option, which is available in the report node. To distribute a report, check that option within the report node and go through the distribution setup.
Alternatively, you can enable job success or failure email notifications. Generally this is only used when users want to be notified immediately if any errors occurred when a scheduled job runs. In order to set up email notifications, go to Job > Properties > E-Mail Notifications and include any distribution lists you would like to receive job status notifications.
Q: How can I correctly use the “Contains” function(s) in the Transform node?
Jon: The most important thing to remember is that the Contains function’s syntax is the inverse of most other functions.
Most functions follow this pattern:
In contrast, with a Contains function, the input comes first and the variable comes second. Below is an example using sample data.
Let’s say that we want to identify all rows that have ‘BIO’ in this column:
The Contains function should be set up as follows:
This function would return:
- A value of 0 for the first row since it does not contain BIO
- A value of 1 for the 2nd and 3rd row since they both contain BIO.
Q: What is Fuzzy Matching node?
Jon: The Fuzzy Matching node can be downloaded for free from the collaborative cloud, and it allows you to compare two strings and identify non-exact matches.
Most commonly this comes into play when there is no unique identifier to match records, and instead, you may be matching on other fields like name, or address. When making these matches there are often abbreviations in one source and not in the other. The fuzzy matching node allows you to compare these types of values to get a score between 0 and 1, where a 1 is a perfect match and a 0 is a non-match.
In the example below I compared variations of my name against my actual name to see what came in with the highest score.
You can see that the exact match returns a value of 1. The next highest score is one letter difference, with an additional ‘i’ at the end of my last name. This scores higher than ‘John MacMillan’, because the beginning of the string is more important than the end of the string with this particular fuzzy matching algorithm.
It is a good idea to experiment with this node to get a feel for the types of scores you can anticipate, and what to consider a valid match.
Q: We will be adding proxy data for student engagement into our student data model from our servers, such as time when students log-into the system, what software they access, for how long, and when they sign-off the system. Will those ‘contextual’ data be useful in improving the accuracy of our retention model? Are there specific ‘contextual’ type data you recommend for improving the predictive accuracy?
James: It should improve the models, yes.
It likely always would have improved the models, but now it fills a gap that didn’t exist before. Prior to now, physical presence and mingling introduced possibilities for negative student experiences to find resolution through routine encounters. Without on-campus interaction, the need to understand a student’s level of engagement with course materials, classmates, or faculty is even more pronounced.
In the short term the key is to focus on the fields in your virtual environments that are routinely collected. Things like logons, assignment submissions, and portal interactions will all be tracked by the system, whereas some features like instructor comments might not be used by all faculty.
The best improvements to your model will come from the contextual data that is consistent across all students and courses.
Until Next Time…
We hope to host another Support Staff Office Hours session soon. If you need any assistance in the meantime, contact support!
Have any additional questions (or answers of your own to contribute)? Enter them in the comments below!