The Roads Not Taken: Exploring Lesser-Used FormulasReading time: 6 minutes
One of the most common support requests we get from users is assistance with manipulating fields to get desired results. The solution to these requests typically involves writing a formula. But there are often countless formulas that can be used to perform the same task.
While repeating a tried-and-true formula is the most comfortable approach, exploring alternative paths helps uncover efficiencies and lessons to apply in other situations.
This came to mind recently when one of our users contacted us for assistance with removing a certain element of text within a column of their higher ed dataset. I wanted to share this scenario in hopes that you find it helpful, and also to illustrate how many different paths someone could take to arrive at the same result.
The Support Request
First, let’s take a look at a sample of the data for this particular problem:
The user’s question was: how do we remove the text that comes after the second underscore (_) in every row of the dataset?
To do so in Construct, we can use the Transform node. The Transform node is flexible, and capable of processing a huge number of formulas. As a result, it offers many paths to the same result.
I’ll show you my first instinct at how to accomplish the task, along with several alternative methods. Along the way, we’ll come to better understand some of the lesser-used (but noteworthy) functions of the Transform node, and see how they might be applied in other situations.
First, let’s take a look at the actual solution I sent to the customer.
My first observation about the data was that it has a delimiter (a character that separates text). In this case the delimiter is the underscore.
When a delimiter is involved, the first function that comes to mind is SUBFIELD, which allows you to extract specific elements of text values based on delimiters.
As requested by the user, we want to remove everything that follows the second underscore. For the purposes of illustration, we will use just one value from the table above:
To visualize how this data is divided up by its delimiters, here is the value separated by columns instead of by underscores:
As you can see, the last section of text (D2) falls into the 3rd subfield.
To isolate the text in the 3rd subfield, we can use the function SUBFIELD([A], 3, ‘_’) where [A] refers to the first column of the dataset, 3 refers to the subfield number, and the ‘_’ refers to the delimiter. Using this function to isolate the desired text, the results look like this:
We’ve isolated the data we want to remove, but we need to take it a step further and tell Construct to remove the third subfield from the column. There are several ways to do this, but in this case, I suggested the REMOVERIGHT function.
REMOVERIGHT allows you to remove any number of characters from the end of a text string. In this example, the number of characters we need to remove differs in each row of the dataset (see table 1 for reference). The LEN function tells Construct how many characters to remove.
To break this down let’s work through the problem step by step using our datapoint ENG_001B_D2 as an example:
The Function in its entirety:
REMOVERIGHT([A], LEN(SUBFIELD([A], 3, ‘_’)))
First, Construct replaces [A] with a value from column A:
REMOVERIGHT(‘ENG_001B_D2’, LEN(SUBFIELD(‘ENG_001B_D2’, 3, ‘_’)))
Next, SUBFIELD isolates the text we want to remove from that value:
LEN tells the function how many characters to retain:
And finally, REMOVERIGHT deletes the value following the final delimiter, leaving us with our desired result:
But wait! What about that extra underscore?
To remove it, we need to add 1 to the length of the third subfield. This tells Construct to count the length of the final piece of text and add 1 more character to represent the underscore. So the final version of the formula looks like this:
REMOVERIGHT([A], LEN(SUBFIELD([A], 3, ‘_’))+1)
Leaving us with a nice clean end result of:
Construct will automatically repeat that set of steps for every row in the dataset, removing everything following the second delimiter and the second delimiter itself.
Working through the problem above got me thinking about how many possible solutions there are to this one question. So, as happens frequently in our office, I set up a challenge. Usually, the most efficient path to an answer wins the challenge. But as an experiment, we set out to find as many possible solutions we could, regardless of their efficiency. I will discuss some of them here.
The REPLACE function allows us to replace a set of characters with anything that we put between two single-quotes. For example, we could REPLACE all instances of the word ‘James’ with ‘Jon’.
Since we’re looking to delete information in this example, we need to use a replaced value of ‘’. If you can’t see anything between those single-quotes, that’s because there’s nothing there: we want to remove a portion of the text by replacing specific characters with nothing.
This solution can create problems; if the set of characters that we want to REPLACE exists within the value multiple times, ALL instances of those characters will be removed. For example, if we had a value like BIO_110_110, and we use the function REPLACE([A], ‘_110’, ‘’) the result would be BIO instead of the desired BIO_110, because both instances of _110 would be removed by the REPLACE function. For the purposes of this example, we are going to set that aside.
The REPLACE function works as follows:
REPLACE([A], '_'+SUBFIELD([A], 3, '_'), '')
In this case (as in our first example) we use the SUBFIELD function to identify the text that we want to replace. We also need to add the underscore (using ‘+’, as you can see in the function) in order to remove the delimiter from the final result (in the first example, we used +1 to accomplish this same task).
Another (simpler) solution is to build a formula that keeps only the desired text, as opposed to removing the text we don’t want. To do that we need to identify which elements of the text we want to keep (in this case, the first and second subfields) using SUBFIELD, then use the CONCATENATE function to stitch them back together.
The CONCATENATE function works as follows:
CONCATENATE(SUBFIELD([A], 1, ‘_’), ‘_’, SUBFIELD([A], 2, ‘_’))
Put simply, this function breaks out the desired text from the original value into two separate components. It then combines them and adds an underscore (‘_’) in between them.
This one gets complicated, so you will have to bear with me.
So far, in each of our functions, we used SUBFIELD to isolate the text that comes after the second underscore. But that’s not the only way to do it!
This time, we’re going to REVERSE the text, so that ENG_001B_D2 becomes 2D_B100_GNE.
This allows Construct to easily identify the first subfield, which is actually the last (told you this would get complicated) and more importantly, it allows the CHARINDEX function to detect the first instance of our delimiter (_). Using a SUBSTRING function, we can return only the text after the first underscore.
This REVERSE function works as follows:
REVERSE(SUBSTRING(REVERSE([A]), CHARINDEX('_', REVERSE([A]))))
First, Construct replaces [A] with the actual value from that column:
REVERSE(SUBSTRING(REVERSE(‘ENG_001B_D2’), CHARINDEX('_', REVERSE(‘ENG_001B_D2’))))
Then the REVERSE functions inside the parentheses flip the value:
REVERSE(SUBSTRING(2D_B100_GNE), CHARINDEX('_', 2D_B100_GNE))))
Once the SUBSTRING and CHARINDEX functions have completed their work:
And now, all that’s left is the final REVERSE to give us:
While this isn’t the most practical solution to this particular problem, it is a helpful function to know about; you may find good uses for it in other circumstances.
Multiple Paths Lead To The Same Result
I don’t want you to think that we left it at just four different solutions. A few were very similar to the ones we shared here. Others would simply take too long to walk through. However, for general interest here are a couple more; feel free to work out how these break down:
REPLACE(SUBFIELD(STUFF([A], 3, 1, '|'), 1, '_'), '|', '_')
I’ve worked with Construct for 8 years. Some of the functions we walked through here were new even to me. By exploring all of these different possibilities, our team learned more about the products and picked up a few new ways to accomplish tasks.
I hope you find some of these functions useful. If you come up with any formulas of your own to solve this problem, we would love to hear about them in the comments below.