Variable Values, Bound and Raw

Authors
  • Name
    Colman Humphrey
    Twitter
Published on

Slight's variables generally use safe bound values. But this doesn't always give enough flexibility. Slight allows you to span the tradeoffs of safety, convenience, and power, as your apps require them.

Slight apps are typically parameterized SQL queries. That means we can let users dynamically change the query when they run it by selecting what values should be used for the app's variables. Let's check out an example query.

SELECT customer_id,
       order_id
       order_amount
  FROM orders
 WHERE order_type >= {{ min_date }}

All parts of the query are fixed except the variable min_date: our users should pick a date when they run the resulting app. For more details on this, see our docs on writing queries, and creating variables.

The app runners should only be able to input a date, as anything else would produce an error. In cases like this, we pass the user's desired min_date value to the database alongside the query: we don't inject the user's date ourselves. This gives us:

  • Safety: the database will reject anything that isn't a valid date.
  • Ease of use: even if you're not worried about safety, you don't want your users unsure of how to provide the values for the app's variables. Slight makes it clear your users must provide a valid date in this example.

Not All Variables Can Be Bound

Variables that fit the above template — requiring simple values that the database itself can confirm are of the correct type — can cover a lot of cases. We say that these variables only take bound values.

There are many situations where this isn't flexible enough. Some examples:

  • Choosing which column to use in a query (or even an expression involving multiple columns).
  • Dynamically selecting which table to pull data from.
  • Letting users select which function to apply, and/or the inputs to those functions.
  • Similarly, letting users select which operators to use. That could be , , or = with numbers; it could be LIKE vs ILIKE with strings.
  • Choosing the ordering of the data, either the columns and/or ASC vs DESC.
  • Giving users the option to specify a value using a function (e.g. CURRENT_DATE()).
Let's look at a quick example of the second, choosing a table. We want our users to choose which geography's traffic incident data to pull from:
SELECT incident_id,
       incident_type,
       incident_timestamp
  FROM {{ geog_traffic_table }}
Databases could in theory expose safe ways to do all of these, but until they do, we have to directly edit the query before sending it to the database to run. That is, we have to inject the user's desired value for the above into the query.

Slight allows you to use variables that take raw values, but we also give you ways to avoid danger.

Example App: Choosing the Table and Column

NOAA maintains a public dataset they call the Global Surface Summary of the Day. But their publicly available data in BigQuery is split into one table per year.

Ideally if we have an app that uses this data, we'll let our users select which table to use: that is, which year to run. This app does exactly that: feel free to try running it, and read the body to see how the year variable is used. It lets users choose the year from a set of options (like a dropdown), and this sets the name of the table to run.

In business situations, there are many reasons data can be split into multiple tables. One reason is geographical splits, which are sometimes required by law. Others include table size and legacy requirements, but a very simple one is that the tables genuinely are quite different in general, but happen to both fit your current query.

Similarly, the app above lets you choose which column in the source data to compute statistics of. This is common, as you may need to use different columns depending on the use-case, for example revenue before or after some types of expenses are taken into account (or pitch deck vs IRS numbers), or say some combination of the responses in this mask-use dataset.

You could just return all analyses on all columns, but this can add up quickly, and can result in a lot of wasted data getting transferred out from the database.

Finally, it's always nice to include an image, this is for 2021:

Go to Graph
Graph showing temperature curves by day in 2021.
The mean and median almost overlap, the 10th percentile looks similar but shifted down
about 20-25ºF; 90th percentile and max flatten out a bit, but still go up in the
Northern hemisphere summer. The min looks odd: it actually goes way way down
in the summer compared to winter (but this is because of the Southern Hemisphere).

Example App: Choosing Which Function to Apply

This simple app counting library dependencies lets us choose which function to apply to the vector of dependency counts. This example isn't amazing, but the concept of choosing average case vs best or worst case is reasonable. The app makes a slightly odd choice to also name the resulting column: this is generally not recommended, but when needed, it can be quite useful.

The above app also lets us select the order we want to see the results. This can be useful if we're comparing different runs with different functions used.

We won't add the picture here, but follow this link for a nice graph.

When and How To Use Raw Values

If we're not careful, allowing raw input can allow malicious actors access to private data. Both the example apps linked above use our category variable, which avoids raw injection completely: we only allow runners to use values that the app authors added as selectable options. That's great, but sometimes we do need the full flexibility of raw input. We might wonder when we should feel safe doing so.

Speaking generally, we have three levels of safety required. Let's go through each.

App Runners Have Full Permissions on the Database

If all our users are also able to run any query they want in the database, we can feel safe allowing raw values. Similarly if the query writes to the database, we can feel OK with raw values if users have full write permission. We see this when:

  • Everyone working with data at a company has run permissions, or similarly the app is only for a small team within the company (or personal use).
  • You bring your own connection to run each app: here by default you can't "raw inject yourself".
  • The app is designed to be built upon by other apps, and those apps don't need the full flexibility.

You may still want to use say a category variable where possible for ease of use.

App Runners Are All Internal (Trusted)

Our trusted users may not have full run permission, but we don't fear malicious intent: they won't work hard to abuse the ability to use raw input.

However, we don't want to give them footguns: the app should still be easy to use! Using category variables, like the example apps linked above, gives a great experience when it can cover all needed bases. Or with the table example, we could have alternatively selected a raw number input, with a good default value and a clear description, so that users could quickly understand the intention is to input a year.

You can also set variables to allow either bound or raw input, but only grant the permission to run with raw values to certain users. This gives you maximum safety and flexibility within a single app.

App Runners Aren't Trusted

Typically this would be an externally facing app.

In Business Intelligence tools, externally facing reports aren't common. But with Slight, it's easy to generate a secure token for an external user and let them run the app as they need it. In this scenario, we must ensure that the external users can only run what we intended.

Typical bound values are fully safe, but indeed so are raw options: it's no more dangerous (but much more convenient) than creating a separate app per option.

Summary: Which Bind To Use

In order:

  1. If bound values solve the needs of the app, use those. They're safe, and Slight can provide an interface that makes it very clear what is expected from the user.
  2. If raw values are needed and there's a limited set of them, use a category variable, and add the needed values as raw options. Note that you can also use bound options at the same time if desired.
  3. If raw values are needed but they don't form a limited set, think about who your user is. If they're untrusted, you will probably have to rethink the app setup, or potentially make multiple apps. If they're trusted, make sure to provide good defaults/examples and a description for the variable. Stay tuned though, we'll be providing even more safe-guards on raw input in the future.

If balancing safety, convenience, and power is something your stored queries could benefit from, we'd be delighted to hear from you. Email us at team@slight.co to get in touch.

Related Articles