Thinking about performance

Performance as in speed, that is

Jan 20, 2023

I’ve got like three drafts on here that are all 2/3 finished. And I’m not happy with any of them, and I don’t want to post nothing. So I figured I’d take some time to talk about some of the things I’m working on professionally. Lately, that’s been performance.

At Runway and all of my previous jobs, there are always a lot of considerations in both the front-end and back-end that affect performance. Ultimately, performance is about making trade-offs. I want to talk about how to think about performance, but first we need to talk about what performance is.

The performance that actually matters is perceived performance. In almost all cases, nobody cares about how much time some back-end system takes to do a job as long as they don’t notice it. If you have a system that generates a report, as long as that report is ready and waiting when your analysts open their email on Monday morning, it really doesn’t matter how long it took for that job to run1.

When I worked at Box, I was on the Performance team. Our goal was to make the Files page (the one you see when you log in) to be interactive in under 1 second for the “median domestic user” (which was based on a more well-defined metric for what that actually means). We didn’t achieve that in my tenure (though I think the project that I worked on got them across the line within about a year after my departure!), but there were a lot of important lessons learned along the way.

One of the principles that we lived by came from Response time in man-computer conversational transactions, which states this:

[Control activation] is the indication of action given, ordinarily, by the movement of a key, switch or other control member that signals it has been physically activated. The click of the typewriter key, or the change in control force after moving a switch past a detent position are examples. They indicate responsiveness of the terminal as an object. This response should be immediate and perceived as a part of the mechanical action induced by the operator. Time delay: No more than 0.1 second.

Which is to say, if the user physically interacts with their device, it should respond fast enough that you can’t perceive the delay. Any “critical” function should be instantaneous. At minimum, this is things like typing (nobody likes keyboard lag). But in a modern application, it should also include the sort of actions that you perform many times per hour.

Assume a graphic or high-speed printer output at the display. The user has completed reading or skimming a section of text which overruns into another "frame." The user activated the "Next Page" control. Here, time delay should be no more than one second until (at least) the first several lines of text on the new page appears. You can test the annoyance of longer delays by becoming engrossed in some text and, when you are about to turn the page, be restrained from doing so to a slow count of four by an associate.
Delays of longer than one second will seem intrusive on the continuity of thought.

In 1968, there was no notion of “hypertext” or web applications. But we can extrapolate this point to refer to navigation as it happens on the web today. If navigation doesn’t load in a second, it breaks your flow.

In very complex problem solving, short-term memory is heavily filled. It is becoming clear in the psychological literature that the degree of complexity of problems that can be solved by a human is dependent on how much information (and in what form) he can hold in short-term memory. Human memory is never passive. Spontaneous noise from within the thinking system, as well as distractions from outside, can interfere with short-term memory contents, and of course these effects rapidly increase when the individual has an awareness of waiting. This awareness comes as soon as several seconds—two seconds still seem to be a good number here.
That is why the tasks which humans can and will perform with machine communications will seriously change their character if response delays are greater than two seconds, with some possible extension of another second or so. Thus, a system with response delays of a standard ten seconds will not permit the kind of thinking continuity essential to sustained problem solving, and especially where the kind of problem or stage of its solution contains a high degree of ambiguity. Such a system will have uses, but they will be different from those of a two-second system.

Emphasis mine. Systems which require you to wait for ten seconds or more for a response are bad tools for doing work. You can’t reasonably include such systems in a workflow that requires any sort of sustained concentration.

If we distill this down to rules for building UIs: 100ms is instant, 1s is acceptable but noticeable, and 10s is the absolute limit. This isn’t really what Robert Miller was trying to get across when he wrote the paper above, but it’s a reasonable rule of thumb. What you’ll notice is that this all matters because of how the end user feels. Performance, fundamentally, is about perception.

Being clever with performance is about first getting really crisp on what’s an acceptable end state, then thinking backwards from the end state to your starting point. Working backwards isn’t a novel way of solving problems, but unlike most other engineering problems, performance challenges’ start and end states are quite malleable.

In a lot of cases, the end state looks like “the Box files page is loaded and interactive.” But that’s not really the point, is it? There’s no business impact to the page loading in some arbitrary time, there’s business impact in customers being able to use the page fast enough that it feels fast. And we can take that a step further: the notion of “done” means having accomplished just enough work to allow the customer to do what they intended.

“The Box files page is loaded and interactive” actually ends up translating to something that only looks like this:

The list of files has loaded and is displayed
Clicking a file would navigate the page2

That’s the overwhelming use case for the files list. Clicking on files. And so all of the stuff that doesn’t involve clicking on files can happen after the point where we consider our page “loaded”:

All of the columns in the file list are fully populated.
- We can show a loading indicator (e.g., a skeleton indicator) for columns with nonessential information.
The preview or sharing sidebars are fully loaded.
The icons or thumbnails are fully loaded.
- These are just plain old <img> tags. The browser will render these progressively as the data comes in.
All of the navigation UI is available.
- If you’re loading a page five folders deep in the folder structure, the breadcrumbs don’t need to have loaded information about the middle three folders. We can show a loading indicator there.
- Only details about the root folder, the current folder, and the ID of the parent folder (for navigation) are needed.

The approach here is acknowledging that the specificity of your performance goal can allow you to tackle a practical outcome and considering other outcomes lower-priority3.

The source of the slowness is really the deciding factor for most performance considerations. In a web application, there are three primary sources of slowness:

The back-end: producing or mutating data is slow
The transport: sending the data from the back-end to the front-end (and vise versa) is slow
The front-end: rendering or interacting with the data once it’s loaded is slow

For each of these, we need to make different kinds of trade-offs.

Recently, I looked into a page that loaded records in a folder structure. The page loads every record for the logged-in user and then the client renders it as a sort of file system view. This makes a trade-off: the initial load is slower (querying every single one of the user’s records from the DB and transporting it to the UI is expensive), but the user is able to filter, sort, and navigate through the folders almost instantly.

If that initial load is under one second, this is a very acceptable trade-off. But what about power users who have tens or hundreds of thousands of records (or more)? In that case, it could be very expensive to load the data and transmit it to the client. The initial load could approach the ten second threshold.

The goals here are three-fold:

The records should be visible and interactive fast on initial load
Sorting/filtering the records should feel almost instantaneous
Navigating between folders should feel almost instantaneous

Fixing #1 is actually easy: only load the records that you’re going to immediately display. This doesn’t solve the problem like you might think, though.

First, if we only load the records for the root folder, navigating to other folders (#3) won’t feel instantaneous.

Second, the user might not use folders at all, keeping everything in the root. We’d end up loading everything anyway. We can fix that with pagination (even very large page sizes, like 1000), but then you can’t sort/filter (#2) on the client, since you don’t have information about the records that aren’t loaded.

We once again come back to perception: the important thing is that the page is visible and ready to be interacted with. If we are strict about holding ourselves to that, we can cut some corners.

Instead of loading all of the records for the customer all at once and blocking the UI on the completion of that request, we can also load only the first few hundred records for the root folder. If the request for the records in the root folder loads first, we have enough information to render the page. And if you click on an item in the page, it’ll navigate like you expect. In the meantime, if the chonky request for all of the records completes, load that data in and use that as the source of truth.

We have to answer some questions, though: what happens if you navigate to a sub-folder? We still don’t have the data for that sub-folder yet. We have some options!

Make another request just for the records of the sub-folder, like we did for the root folder. It’ll be fast, but not instantaneous. This pessimistically assumes that the expensive request for all of the records will take a long time to complete.
Block on the expensive request for all of the records. It might be fast. This optimistically assumes that the expensive request is going to load very soon4.

Another question is what to do about the truncated page size. If we only load, say, the first 500 records in the root folder (or any sub-folders), what do we do? The answers are actually really similar to how we handle navigation:

If you try to sort or filter, you can either block on the expensive request completing (with a little loading indicator), or fire off another one-off request for the sorted/filtered version.
We can bet that the user probably won’t scroll down much. If the user does scroll down, show a loading indicator where the list is truncated. A simple spinner is fine, but if you know how many records there will be, you can construct them out with skeleton indicators as well. Block on the completion of the expensive request, or fire off ad-hoc requests for the next page (a la infinite scroll).

All of this assumes that the back-end will respond fairly quickly, even for users with a huge number of records. If we’re starting to cross into ten second territory, though, we probably need to reevaluate whether that’s the best strategy.

What’s useful is how there’s a path to our goals by being really explicit about what work needs to happen when. If you start with a number and then decide to define what actually happens by that time, it’s like saying you’re going to finish a race in a certain amount of time and then figuring out where to draw the finish line. You instead need to consider what the user can do, and then decide (realistically) how quick you can make that happen. And that, of course, means that you’re not just defining what needs to be done by when, but what work can be deprioritized and done asynchronously.

When I was at Stripe, we had an internal tool for querying our Trino database. You could get answers derived from almost any application data. However, to do that, you needed to find the data. Helpfully, the tool had a sidebar with a search function, letting you query the tables that were available.

Less helpfully, the search box did a fuzzy search over the full list of tables, which was in the tens of thousands. Typing meant there was significant keyboard lag: even if the letters you typed appeared onscreen, there was a multi-second delay in filtering down the list of tables. This bugged me enough to get me to write a solution.

What I came up with was futzy. It’s a fuzzy search library that lets you search ordered chunks of character-delimited strings5. It works by building an index on application startup that organizes all of the strings into an easily-queryable data structure. To put that into the terms we defined above, we go from an instantaneous startup time to a fast startup time, in exchange for having an instantaneous search instead of a fast search.

This is another invaluable technique for improving performance: doing work ahead of time. On the client, there aren’t many opportunities for this in practice. But on the server, we do this all the time: denormalizing fields in the database, creating database indexes, and cache warming all trade some resources (time, space) up-front for better performance properties later on.

When I worked at Uber, I was asked to join the Finance Engineering team after six months on my original team. Just before I’d transferred in, I’d heard an anecdote about a PM (recently departed6) who demanded that the financial forecasting and planning tool be "faster than Excel." While the very premise of that statement is maddening, it's an interesting product requirement to dissect.

The tool in question used a back-end calculation system based on IBM TM1, which for all intents and purposes, is glorified Microsoft Excel with N dimensions instead of 2 (or three, if you count sheets as a dimension). The UI we were building needed to respond quickly so that the executives and analysts using the tool could get updates as needed.

What’s wild about the product requirement is that a system with a back-end component fundamentally can’t be as fast as Excel7 because there's network transit involved. Almost everything in Excel is instantaneous—with few exceptions—and you simply can't build a system like that where there is inherent latency. But there are things that we can do to hide it.

For one, the tool we were building had steps. The process of doing financial forecasting and planning was broken down into logical steps so that we progressively collected more information as you progressed through the tool, showing you only what was relevant to what you needed to see or do at that point in the process.

But the process wasn’t strictly linear: in reality, there were often steps which could be completed in parallel. If we hid sequenced steps that had dependencies on data from a previous step while that data’s numbers got crunched before steps with no outstanding dependencies, you could proceed through the tool without ever noticing any latency. Or at least as long as the user didn’t rapidly click through the steps.

Convincing management that you’re right about performance

There are two battles you’ll fight with management about performance:

Things could be faster and they don’t see the benefit
It’s not fast enough for them and it needs to be faster

In the first case, the trick is to get really crisp on what the final experience could look like. Quantifying perceived slowness is the easiest way to go about this:

“The median user spends a minute and forty seven seconds waiting during an hour of using this product”
- “For our N customers on 3G internet, the Guinness World Record for the fastest time to drink two liters of soda is nearly two times faster than our web application becomes interactive.”8
“Our users could work 20% faster, theoretically generating up to 20% more revenue through usage-based billing”
“It takes our users 30% longer to accomplish their goals with our product compared to the same outcomes on [competitor]’s product”

I could write a whole other post about how to convince leadership to care about performance, but getting creative and showing hard numbers is the big lesson. It’s also useful to point out efficiency improvements that might come along as a result of the performance improvement, if there are any: “if we make the product faster in this way, it’ll save us [some cost estimate] over the next year in {compute|bandwidth|storage} costs”. But this isn’t a post about efficiency.

For the second case, you have to set ground rules.

The speed of light can’t be changed. If you need to move data, it has to be moved in advance, compressed, or eliminated.
Performance goals need to be framed as a measurement of changes in what a user experiences, not as a strict numerical game. You aren’t going to algorithmically squeeze more fast juice from the ol’ performance stone because the OKRs said so.
Performance in a customer-facing application requires many different disciplines. This means bureaucracy.
1. If it’s a priority, it needs to be prioritized and resourced.
2. Covering multiple disciplines means covering multiple teams. Other teams need to let you make your changes.
Making an application (which runs efficiently) faster almost always means additional operational costs. You can’t avoid that.
1. In my example about the folder structure, a single request for all of the records in all of the folders becomes two or more requests.
2. In my example about the schema search tool at Stripe, I introduced a new library that increased the build time and bundle size (though not by much).
3. In my example about the Box files page, new requests were introduced that made expensive queries. If the user can move faster, more of these will be made (even if they don’t all finish), increasing database load.

Once that’s out of the way, the next step is to get buy-in for the changes that you will make. Management shouldn’t be dictating how it should be faster. Management should be saying that it’s slow, and engineering and product management should work together to figure out—given the constraints of the systems and the speed of light—how slowness can be hidden or eliminated.

My last tip is to be as visual as possible with your proposals. Talking about performance is hard because unless the person you’re speaking to is intimately familiar with the ins-and-outs of the application and the underlying systems, it’s very hard to imagine what the benefit might be. That means their approval is dependent on them imagining the right thing.

Being visual—as with any demo—concretely shows how something is (or will be) and leaves no room for misinterpretation. Folks can get a “feel” for a change by seeing it, but not by being told about it.

I’ll include a big asterisk here and say that there is obviously the compute cost, which is real and tangible, but I’m specifically talking about performance here, not efficiency. Efficiency and performance are similar in terms of discipline and approach, but different in terms of constraints and solution spaces.

For the purposes of this post, I’m going to assume that everything is mostly-efficient and that there’s not any obvious waste.

A fun note is that if your back-end is returning HTML (it’s a big server-rendered template, like at Box, or you’re using an SSR framework), the version of the page without the JS running should just…do this. Perhaps more slowly as it’s a full page load, but it’ll still work.

I think it’s important to acknowledge that nonessential components should be measured and they should have performance goals, but after a certain point the returns are insignificant.

It also reduces load on the back-end. But this is a post about performance, not efficiency.

It works by creating tries of each of the tokens in each of the strings it’s fed. Each of the tokens in a query is looked up in the trie, and the possible result set is filtered down. This reduces the overall search space dramatically. Finally, the resulting set is filtered to make sure the tokens appear in the order they’re queried. And last, the results are scored based on how similar to the original query they are.

If you’re a Stripe, you can read the full details in my notes from 2021-06-14. If you’re not, and you’re interested in learning more, let me know and I can write a post!

There’s a great story here about the unbelievable turnover in PMs that team had. It was nearly one every two months!

I’m going to hand-wave away Excel functionality that pulls from a back-end, because very few people know this is even a thing, let alone use it.

I’m looking at you, Stripe Dashboard.

Basta’s Notes

Discussion about this post