Build a Better Instance Part Two: Intermediate Configurable Joins Skills and Use Cases
As more and more Slate users begin querying using Configurable Joins, subquery exports and filters have become an increasingly popular discussion topic. In hopes of answering some of the most frequently asked questions, on March 2 RHB Slate team members Abraham Noel, John Michael Cuccia, Domenick Rozzi, and Megan Miller hosted hundreds of Slate user community members for “Building Up: Intermediate Configurable Joins Skills and Use Cases”. As part two of the Build a Better Instance with RHB: Configurable Joins webinar series, the team focused specifically on how and why to use dependent subqueries as a means of better exploring the data living within our Slate instances.
Users came away from this event with a deeper understanding of this crucial Configurable Joins feature, including these three key takeaways.
Subqueries enable us to access and configure our data from one-to-many relationships.
In our first webinar, we discussed the concept of Slate as a relational database, where record data lives in various structures known as tables. In some situations, these tables may have a one-to-one relationship; for instance, an application always has just one round, or a device has one Person record it’s connected to. However, many times, one record can be connected to multiple records in another table—a person will frequently have multiple school records, relationships, applications or addresses, just to name a few examples.
While a top-level join will allow us to join a record to other tables in a one-to-one relationship, subquery exports and filters allow us to go further, examining the data connected to a record that exists on multiple table rows and returning it in ways that are useful for summarizing or interpreting this information. We can filter on a more granular level to find records, such as first-year applicants who have a relationship with an email address in the relationship type of “Mother”, or we can export more extensive data, such as returning a list of person records that includes the name and GPA for all postsecondary schools attended. Subqueries allow us to dig deeper and cast a wider net when it comes to accessing the information in our database, so long as we build our Configurable Joins queries correctly.
Subquery exports are used to summarize, aggregate and format our data.
Subquery exports are designed to display the specific data we want to see about related records, and we have numerous options for how that information is returned. By default, the data rows will concatenate, which combines the values together in the format we desire and with the row separators we define. We have other export options as well:
- Aggregate: gives us summary information about the data rows, such as the number of test scores received, sum of years of work experience, or average GPA across all schools.
- Dictionary: outputs multiple values for Liquid markup and is used in scenarios such as a Checklist reminder message in Deliver.
- Rank: produces a customized ranking that differs from the system ranking, useful in situations such as wanting to order test scores by date taken, regardless of verified status.
- Existence: returns a specified value based on whether another exists, such as exporting ‘Y’ if the student has an undergraduate degree and ‘N’ if they do not.
- Formula: allows for custom SQL and other formulas; for instance, coalescing device values by type for each relationship record.
- XML: outputs values as nested XML elements, which can be useful in use cases such as defining XML attributes.
- JSON: exports values in name/value JSON pairs that can be used in web services.
Subquery filters give us the power to precisely define our criteria.
When filtering using Local or Slate Template Library queries, we can oftentimes find ourselves frustrated by the limitations that arise, as we’re only able to define criteria for related objects to a limited extent. Subquery filters allow us to remove these barriers and become incredibly specific as we search for records, so if we only want to find students who attended an undergraduate institution in Florida within the past eight years, we can do so without relying on custom SQL. As with subquery exports, subquery filters give us numerous functions to work with:
- Exists/Not Exists: limits records based on a value existing or not existing, e.g., returning only those students with a verified TOEFL total score of 90 or higher.
- Comparison: compares two values of the same type, such as records whose undergraduate degree conferred date is greater than their application submitted date.
- Formula: enables custom SQL and other formulas, as would be the case for finding records who have a parent with an email address LIKE ‘@gmail.com’.
- Count: finds records who have an aggregate count of some data point that aligns with the defined parameters, such as students who have more than one writing sample in their materials.
- Average: pulls records based upon a calculated average in comparison to a defined value; for example, only including records whose average Recommender score is greater than or equal to 4.0.
- Median: limits search results to those whose calculated median for a data point meets the specified criteria, such as finding all high schools whose median applicant GPA is between 3.3 and 3.7.
- Minimum: finds records whose lowest value for a data point corresponds with a specific comparison operator, as in the case of pulling all applications whose lowest faculty review score in Reader was below 2.0.
- Maximum: identifies the records whose highest value for a data point compares as indicated to a defined number, e.g., all records who have not made any gifts exceeding $1,000.
- Sum: filters records to those whose data point sum compares appropriately to a specific value, such as any applicant with total work experience of 10+ years across all jobs.
- Standard Deviation: limits to results that have a standard deviation for a data point within the articulated parameters. For instance, you might pull only the Funds that have a gift amount standard deviation less than $50.
- Variance: pulls records with a data point that falls within a specific variance definition; an example might be identifying all events that have an overall survey rating with a variance of 1.5 or more.
Want to explore this in further detail? You can watch our full webinar, read the transcript, and download the slides to dig deeper. And of course, we hope you’ll register to join us on March 30 for the final installment in our series, “Expanding Into New Spaces: Configurable Joins Subqueries and Advanced Concepts”.
Bonus: Answering the questions from the session
During this webinar, the team also responded in real time to the questions asked using the event’s Q&A feature. Here’s what they shared with attendees.
Where can we find the Configurable Joins Base Explorer?
This is in the Database; you’ll find it in the Queries section.
When do you recommend using a join at the bottom versus creating a subquery and then adding the joins in the subquery?
Joins at the bottom of the query are always one-to-one. Joins at the bottom will only ever return one record on the other side, even if you join to something that has multiple records. Whenever you need to know something about “the many,” you have to join in a subquery.
If you’re filtering to exactly one version of the joined information (for instance, as defined by rank or timestamp order, or something like that), you can use a bottom join. Another reason to join at the bottom (remembering that that is a one-to-one join) is that then you can add multiple exports/filters from that single join. You want to avoid joining multiple times to the exact same tables across several subquery exports or filters, as this is inefficient, so if the join needs to be accessible to multiple exports or filters, that’s a good reason to consider doing it at the base level of the query.
However, if School were added as a top-level join (down at the bottom of the query), you would have to define which school you’re joining to, so the result count wouldn’t change but the output in exports would be hardcoded to only that specified school. In contrast, the subquery join to schools allows you to pull in every school in the record and leverage concatenation or some other aggregation to display in the way you want. If you’re looking for an existence of a value in a field or of a particular value, those would also be in subqueries, and if you want to filter on a full set of joined values, that would be in subqueries as well.
As a general rule, if you see the blue and pink squares icon, this indicates a one-to-many relationship that will be best served in a subquery to control your output.
So “one-to-many” always means “subquery”?
In most cases, yes. However, it is permissible to do a join at the bottom of the query that goes into a one-to-many relationship—but because it is only going to return one record, you have to further define the join to tell Slate exactly how to select the one record it will return from the many.
Can you give an example of when to use a subquery in filters v. exports?
Exports return data about your records; filters reduce your number of matching records. I’d use an export when I want School name. I’d use a filter when I want only a Rank 1 School.
When do you use a subquery within a subquery? I understand subquery filters, but don’t know why you would do a second subquery within the first one you opened.
It all depends on what you’re trying to get at. If the data that you are looking to filter on exists simply by a single join, then one layer is only necessary. However, if you’re filtering on information at an aggregate data point, as an example, you’d need a subquery for the filter and a subquery to establish the aggregate information.
Is it possible to pull multiple applications for one record (e.g., if they applied in two different rounds)?
Yes, absolutely! From the Person, you’d use a subquery export and join from Person to Applications, then select the exports you want, with “concatenate” for the output.
Why/when would we use the Output = Rank when using the Schools subquery export instead of just selecting one of the Schools by Rank filter as the export?
The “Rank” subquery export function allows you to define and customize how the schools are ranked on a record versus using the system’s default determined ranking. I would consider this an advanced use case, where “Rank 1 School” means something for your institution that differs from how it is defined in the Knowledge Base article.
What’s the difference between “concat” and “coalesce”?
Concatenate will give you all of the values in that sequence, while coalesce will give you the first non-null value in that sequence. Coalesce will only result in one output, but it can consider many variables, while concat will effectively mush the data together. While isnull() also behaves like coalesce, it only works for two variables.
When using the Dictionary subquery export, do you have to add the code for looping in the email, or is that built into the export?
You will need to add the Liquid Looping into your HTML, just as you would with other Liquid markup. Here’s a great Liquid markup resource.
I have trouble when trying to get applicant data to join on Form Responses, such as conversion of Inquiry to Applicant or, Admit to Deposit on an event/series of events. Should I be using subqueries for this? My report tends to error out when I try to add to the form response.
Form Responses are a bit of a special case. Hence the magic hat. With form responses, it’s important to know the scope of your forms. If Person-scoped, the form response table is associated to the Person record. If Application-scoped, it’s related to the specific application. If you’re trying to report on form registrations that span scopes, you really want to set up different joins for each. Slate works best when you can be specific about which form you’re trying to get responses from, such as using a folder to drill in to get to a particular form or using a template to drill in to get a specific set of events, etc.
That said, you do have the ability to join to all form responses, but in doing so you really need to consider adding filters to narrow the join options, or it will severely affect performance of your query.
Why do we use “Lookup Round”, and not just “Round”?
Round is a data point in the application, while Lookup Round is the data associated to the round (this is the configuration data you set up when building a round—associated period, export values, key, etc).
In more detail, an application is associated to a round, so round becomes a data point in the application, but the other data points connected to the round (like Key) live in a separate table. In order to access those data, we need to get to the other table; this is the lookup.round table.
When Abraham built the CJ Schools query, why didn’t the query results display one row per school, with Person populated (or not) with people associated with that school? The results looked more like an inner join with results returned if there was a match between Person and School.
The Schools table is only looking at schools that are associated to Person records, so Schools is a table that has that direct association to a Person record. In contrast, the Organizations table will list all of the organizations in the database. Schools and Organizations are related through their key (School Code and Organization Key), but they are fully separate data ‘rows’ in the system.
So am I to understand then that this idea or use of the word “dataset” in Slate refers to what amounts to something of a validation table?
“Dataset” is commonly referred to in a few different ways in Slate, as is the case with several Slate terms. In the case of Dataset as an object in Slate, it is a separate table of information that independently exists from the person/application tables and can serve multiple functions (e.g. as a validation table or as a completely different set of information).
Why does the Tag filter need to be a subquery filter instead of a base filter?
A record can have more than one tag, so when filtering for the non existence of a tag, you want to consider ALL of the tags on a record. This “all” concept requires a subquery.
Couldn’t we just use “Lookup Tag” instead of both “Tag” and “Lookup Tag”? Would would happen?
Person does not join directly to Lookup Tag, and so we wouldn’t be able to create that join. We have to join first to Tags, as Tags do relate to the record, then we can access the data about that tag. Remember that “Lookup ____” means that we’re wanting to get the metadata about an object itself, such as the object’s name, active status, export values, etc.
At our institution we have noticed that this exclusion of Tag In/Not In Test Record is not working. Has anyone else experienced this recently?
You really want to do a subquery looking for the non-existence of Tag IN Test Record. The issue with using NOT IN here is that this will omit all records with “nulls”— aka all records with no tags.
Is there a good glossary available in the Slate Community where one can find sample formulas and examples?
Your best bet for examples is leveraging your Clean Slate Showcase. However, the forums also have many examples — keyword searching can be very helpful.