No sacred masterpieces
Or "that time I built Excel for Uber and they ditched it like a week after launch"
This past month I’ve been working on a project that I’m eager to write way too many words about. But for now, it’s not ready to talk about in public. Meanwhile, I either have way too many words about topics that I’m confident nobody wants to hear about or too few words about topics that folks tend to find interesting.
In lieu of publishing something too unappealing or too trite, I’ve decided to tell a (true!) story that’s been rattling around in the back of my head for more than a few years.
In 2016 I joined Uber. I’d followed a director from Box who had been offered to lead Business Intelligence at Uber. She told me about a team that she thought I’d be perfect for—it was called “Crystal Ball” and it was doing some of the most incredible work she’d come across. I’d put in two good years at Box and seen it through an IPO and was ready for something new, so I jumped.
Uber was weird from the get-go. The first week (dubbed “Engucation”) was a mix of learning how to do things that I’d never need to do in the role that I held and setting up benefits and taking compliance classes. Travis Kalanick joined us for a Q&A where he showed off pictures of the self-driving car that the ATG arm of the company was building (it just looked like an SUV with cameras) and the more visually impressive mapping car that was gathering data for the self-driving project (it looked like a weird Dalek on wheels).
When I met the members of the Crystal Ball team, it was about four people (not including myself). Everyone was heavily biased towards back-end. I was given a brief tour to the fourth floor of 1455 Market St to understand the problem that the team was solving.
“You see all these desks?”
“This is where the data scientists sit. They build data science models in R. They run those models on data that they download from Vertica1.”
“The problem is that the models are slow and take up a lot of resources. So the data scientists have multiple laptops that they download the data to, then run the models overnight. When the data scientists arrive in the morning, the laptops whose models didn’t crash have data that’s maybe usable that day.”
“What about the other laptops?”
“We don’t have the data we need and we lose money.”
This was a big problem for the business: they needed a way to take two kinds of inputs (data and code) and run the code to produce useful outputs. Or, hopefully useful. Testing a model meant running it, so the iteration cycle was very close to one iteration per day per laptop.
The team, when I joined, had the beginnings of a tool to automate this. It was called “R-Crusher” and it was essentially a system for scheduling work. They were able to make some API calls and code would be downloaded and executed, and an output file would appear in a directory eventually. As the first (self-professed) front-end engineer on the team, it was my job to build the tool to expose this to the rest of the company.
I was grateful that I didn’t need to write any “real” code. I lived in the world of React building UIs with Uber’s in-house front-end framework2 (“Bedrock”). Any time I needed something, I could ask the back-end folks to update the R-Crusher API and I’d get some notes a few hours later to unblock me.
The first version of the front-end for R-Crusher (“Wesley”3) was ready in very little time—maybe a few weeks from the point I joined? It was a joy.
The next 6-7 months were a hectic rush. I was tasked with hiring more front-end engineers. I built a front-end team4 of seven people. We added user-facing features to Wesley and R-Crusher (“Can we have a text box that only takes capital letters?” “Can we make this text box allow a number whose maximum value is this other text box?”) and debugging tools for the team (“Can we see the output log of the running jobs?”).
There were effectively only two things that people were working on at Uber in 2016:
The app rewrite/redesign (which launched in November 2016)
All of my work and the team’s work, ultimately, was to support Uber China. R-Crusher was a tool to help get the data we needed to compete with Didi. Nobody really cared very much about the processes for the US and any other country—we had less to lose and Lyft wasn’t seen as remarkable competition except in the handful of cities they were operating at scale. China was a make-or-break opportunity for Uber, China was only going to succeed if we had the data for it5, and the data was going to come (at least in part) from R-Crusher.
Over the summer of 2016, we came up against a new twist on the project. We had a model that ran overnight to generate data for anticipated ridership in China. That data wasn’t useful on its own, but if you fed it into a tab on a special Excel spreadsheet, you’d get a little interactive Excel tool for choosing driver incentives. Our job was to take that spreadsheet and make it available as the interface for this model’s data.
Now, this was no small feat on the back-end or front-end. First, the data needed to be run and moved to an appropriate location. Then, we had a challenging problem: we needed to take all the logic from this spreadsheet (with hundreds if not thousands of formulas across multiple sheets) and turn it into a UI that Uber China city teams could log into to use. The direction we got from the head of finance at Uber (who, for whatever reason, was seemingly responsible for the project) was “take this [the spreadsheet] and put it in on the website [Wesley].”
We asked how we could simplify the UI to meet the resource constraints (engineer time) we had. “The city teams only know how to use Excel, just make it like Excel.” We tried explaining why that was hard and what we could confidently deliver in the time allotted. “Every day that we don’t have this tool as specced, we’re losing millions of dollars.” There was no budging on the spec.
In 2015, I had built a prototype of a tool at Box. Box had a collaborative note-taking product called Box Notes (based on Hackpad). I had the idea to make a similar project for working with numbers: sometimes you didn’t need a full spreadsheet, you just needed a place to put together a handful of formulas, format it with some headings and text, and share it with other people. Sort of like an ipython notebook for spreadsheets. I called it Box Sums.
When I built this, I created a simple React-based spreadsheet UI6 and a super basic spreadsheet formula engine. A few hundred lines of code. And if you dropped an XLS/XLSX file onto the page, I used a Node library to parse it and extract the contents.
I demoed Box Sums to the Box Notes team at some point, and they nitpicked the UI and implementation details (“What if two people type in the same cell at the same time? They’ll just overwrite each other.” 🙄). Nothing came of it, but I took the code and shoved it into my back pocket for a rainy day.
My idea was to take this code and spruce it up for Uber’s use case. Fill in all the missing features in the spreadsheet engine so that everything the spreadsheet needed to run was supported. The back-end could serve up a 2D array of data representing the ridership data input, and we'd feed that in. And the UI would simply make all but the cells that were meant to be interactive read-only instead.
I got to work polishing up the code. I parsed the XLS file and extracted all the formulas. I found all of the functions those formulas used, and implemented them in my spreadsheet engine. I then went through and implemented all the fun syntax that I hadn’t implemented for my demo at Box (like absolute cell references, where inserting
$ characters into cell references makes them keep their column/row when you drag the corner of the cell, or referencing cells in other sheets).
I sat in the black mirrored wall “spaceship hallway” of 1455’s fifth floor with my headphones playing the same handful of songs on repeat. I spent the early days fixing crashes and debugging errant
NaNs. Then I dug into big performance issues7. And finally, I spent time polishing the UI.
When everything was working, I started checking my work. I entered some values into the Excel version and my version, and compared the numbers.
Excel’s output: 3.03
My output: 3.01
Excel’s output: 1.002
My output: 1.000
The answers were all almost correct. After a week of work, I was very pleased to see the sheet working to the extent that it was, but having answers that were very, very close is objectively worse than having numbers that are wildly wrong: very wrong numbers usually always mean a simply logic problem. Almost-correct numbers mean something more insidious.
I started stepping through the debugger as the calculation engine crawled the spreadsheet’s formula graph. I compared computed values to what they were in the Excel version. The sheer size of the spreadsheet made it almost impossible to trace through all of the formulas (there were simply too many), and I didn’t have another spreadsheet which exhibited this problem.
I wrote unit tests. They all passed.
I googled for esoteric knowledge about Excel, rounding, or anything related to non-integer numbers that I could find. It all led nowhere.
Just as I was about to resign myself to stepping through the thousands of formulas and recomputations, I decided to head down to the fourth floor to just ask one of the data scientists.
I approached their desks. They looked up and had a look of recognition.
“Hey guys. I’m working on the driver incentive spreadsheet. I’m trying to mimic the calculations that you have in Excel, but my numbers are all just a little bit off. I was hoping you might have some ideas about what’s going on.”
“Can I take a look?” I showed him my laptop and he played with a few numbers in the inputs. “Oh, that’s the circ.”
Another data scientist looked up, “The circular reference.”
“We use a circular reference in Excel to do linear regression.”
My mind was blown. I had thought, naively perhaps, that circular references in Excel simply created an error. But this data scientist showed me that Excel doesn’t error on circular references—if the computed value of the cell converges.
You see, when formulas create a circular reference, Excel will run that computation up to a number of times. If, in those computations, the magnitude of the difference between the most recent and previous computed values for the cell falls below some pre-defined epsilon value (usually a very small number, like 0.00001), Excel will stop recomputing the cell and pretend like it finished successfully.
I thanked the data scientists and returned to the spaceship hallway to think about what the fuck I was going to do next.
The changes I needed to make were pretty straightforward. First, it required knowing whether a downstream cell was already computed upstream (for whatever definitions of “downstream” and “upstream” you want to use; there’s not really a good notion of “up” and “down” in a spreadsheet or this graph). If you went to recompute a cell with a formula that referenced an already-recomputed cell, you’d simply keep track of the number of times you computed that cell. If the recomputed value was close enough to the previous value that it fell below the epsilon, you simply pretended like you didn’t recompute the cell and moved on. If it didn’t, you’d continue the process until the number of iterations that you’re keeping track of hit some arbitrary limit (for me, 1000), at which point you’d bail.
The changes took a day and a half to make. And would you believe, it worked. The outputs were exactly what they should have been. I wrote tests, I integrated the damn thing into Wesley, and I brought it to the team. We delivered the project in the second week of July.
Two things happened. The first was of little consequence but I enjoy telling the story. Rakesh, the team lead working on the back-end, asked me where I got the Excel component.
“I made it.”
“But where did you get the Excel engine?”
“I made it.”
“But how are you running Excel in the browser?”
“Everything you see is built by me, from scratch.”
He simply couldn’t believe that I’d written a full spreadsheet engine that ran in the browser. All things considered, it was maybe only five thousand lines of code total. A gnarly five thousand lines, but (obviously) not intractable. His assumption about the sheer complexity of that option was that it wasn’t a reasonable project to take on.
I do think that if I had challenged Rakesh—under no time pressure—to build a spreadsheet engine, he’d get to a working solution as well. My recollection is that he was a very competent engineer. Despite that, I think his intuition about the complexity and scope were based on bad assumptions about what we were ultimately accomplishing, and it’s a good case study in estimating reasonable project outcomes. It goes to show that the sheer imagined complexity of a possible solution is enough to disqualify it in some folks’ minds, even if it's the best possible outcome.
The second thing that happened was we shipped. We got the Uber China city team members logging in and using the tool. They plugged away at it, and to my knowledge, the numbers it produced drove driver incentives.
That was the third week of July.
The last week of July, the head of finance rushed over to our desks.
“Why can you see the formulas?”
“When you click in the cells of the spreadsheet you can see the formulas. You shouldn’t be able to do that.”
“You said to make it just like Excel.”
“People working for Didi apply for intern jobs at Uber China and then exfiltrate our data. We can’t let them see the formulas or they’ll just copy what we do!”
Apparently that was a thing. I remember being only half-surprised at the time. I hadn’t considered that our threat model might include employees leaking the computations used to produce the numbers in question. Of course, short of moving the computations up to the server, we couldn't *really* protect the formulas, but that was beyond the scope of what we were being asked to do.
The fix was straightforward: I updated the UI to simply not show formulas when you clicked in cells. Easy enough, I guess.
The first week of August 2016, Uber China was sold to Didi. Most of us found out because our phones started dinging with news stories about it. We all stopped working and waited until an email arrived a couple hours later announcing the deal internally. If I remember correctly, I just left the office and headed home around lunch time because our team didn’t have anything to do that wasn’t Uber China-related (yet).
After Uber China evaporated, the tool was unceremoniously ripped out of Wesley. It was a bespoke UI for a data job that would never run again. We were never asked to build Excel in the browser again8.
I feel no sense of loss or disappointment. I wasn’t disappointed at the time, either.
My first reaction was to publish the code on Github.
My second reaction was to move on. There was maybe a part of me—my younger self—that was disappointed that this major piece of code that I’d labored over had been so gently used before being retired. I wasn’t recognized for it in any material way. My manager didn’t even know what I’d built.
On the other hand, we as engineers need to be real with ourselves. Every piece of code you write as an engineer is legacy code. Maybe not right now, but it will be. Someone will take joy in ripping it out someday. Every masterpiece will be gleefully replaced, it’s just a matter of time. So why get precious about how long that period of time is?
I often hear fairly junior folks saying things to the effect of “I’m here to grow as an engineer.” Growing as an engineer is mutually exclusive with the longevity of your output as an engineer. “Growing as an engineer” means becoming a better engineer, and becoming a better engineer (directly or indirectly) means getting better at using your skills to create business value. Early in your career, the work you do will likely have far less longevity than the work you do later on, simply because you gain maturity over time and learn to build tools that tend to be useful for longer.
Sometimes the business value your work generates comes in the way of technical output. Sometimes it’s how you work with the people around you (collaborating, mentoring, etc.). Sometimes it’s about how you support the rest of the team. There are many ways that business value is created.
The end (demise?) of Uber China implicitly meant that there was no business value left to create with this project. Continuing to push on it wouldn’t have gotten me or the business anywhere, even if what I'd done was the best possible solution to the problem.
Sometimes that’s just how it is. The devops saying “Cattle, not pets” is apt here: code (and by proxy, the products built with that code) is cattle. It does a job for you, and when that job is no longer useful, the code is ready to be retired. If you treat the code like a pet for sentimental reasons, you’re working in direct opposition to the interests of the business.
As much as I’d love to work on Uber Excel (I'm ashamed to admit that I thought of “Uber Sheets" far too long after I left the company), I was hired to solve problems. Having Excel in the browser was a useful solution, but the problem wasn’t showing spreadsheets in the browser: the problem was getting a specific UI delivered to the right users quickly.
It’s easy to treat a particularly clever or elegant piece of code as a masterpiece. It might very well be a beautiful trinket! But we engineers are not in the business of beautiful trinkets, we’re in the business of outcomes. In the same way that a chef shouldn’t be disappointed that a beautiful plate of food is “destroyed” by a hungry customer eating it, we shouldn’t be disappointed that our beautiful git repos are marked as “Archived” and shuffled off the production kube cluster.
The attitudes that we have towards the things that we make are good indicators of maturity. It’s natural for us to want our work to have staying power and longevity. It’s extremely human to want the validation of our beautiful things being seen and used and recognized; it means we’ve done well. On the other hand, our work being discarded gives us an opportunity to understand what (if anything) we could have done better:
Did we build something that didn’t meet the project constraints?
Did we build what was requested, but what was requested wasn’t the right thing to ask for?
Was the core problem misunderstood?
Did the requested solution actually address the needs of the end user?
What questions didn’t we ask the stakeholders that could have better-aligned our output with the business need that triggered the request to engineering?
Were the expectations that we set around the project inaccurate or vague?
Did the project need to be as robust as what was delivered? Could a simpler or less clever solution solved the need equally well?
Did we focus on the wrong success criteria?
Did we even have success criteria beyond “build what was requested?”
Who could have been consulted before or after delivery of the project to validate whether all of the actual project requirements were satisfied?
You won’t have the opportunity to take lessons away from the project if you see the sunsetting of the project as a failure: there’s often much to learn about what non-technical aspects of the project broke down. Perhaps there aren’t any, and maybe management is just a group of fools! But often that’s not the case; your delicately milled cog wasn’t ripped out of the machine because it was misunderstood, it was ripped out because it didn’t operate smoothly as a part of the larger system it was installed in.
Vertica, for those unfamiliar, is an analytics database that’s designed for very fast queries over very large sets of mostly read-only data.
To this day, I’ve never encountered an in-house application system as well-designed as Uber’s. You could go from start to Hello World running on a
*.uberinternal.com subdomain in under 30 minutes with full CI/CD.
I like to name my projects well, and this one is probably the best-named project in my career. It turns out that Wesley Crusher’s middle initial is “R”, and so that simply had to be the name.
Initially we had no manager and instead reported directly to the director that I’d followed. I doubt she had much time to act as the hiring manager, and my suspicion is that she looked at my yes/no recommendation on the interview scorecard as her decision.
We eventually got an EM who acted as the hiring manager, but—while perhaps a story for another blog post—he was absolutely dogshit terrible as his job and I suspect he also was not invested in the hiring process and just took my recommendation. Talking about this awful EM and the other particularly bad EM(s?) I’ve had is a topic for another post.
For instance, knowing what to offer to drivers as incentives so they wouldn’t drive for Didi instead. The company had real issues where if we got driver incentives wrong, drivers would open the app to check incentives, switch to Didi to see their incentives, and simply not switch back.
I cannot underscore enough how remedial this UI was, supporting only the most trivial of spreadsheet functionality. Selecting cells, editing their contents, navigating around with your keyboard—that sort of thing. I think the most advanced feature was the draggable corner in the bottom right to copy the contents of the cell left or right (adjusting cell references in formulas appropriately).
I modeled the spreadsheets as a directed graph: each cell was a node, and each cell reference in a cell’s formula was an edge to another cell. Initially, a changed cell meant first computing its value by looking at the cached value of its dependencies (if you reference another cell in the one you just changed, the value of that other cell hasn’t changed, so just use the previously computed value). But then, you need to recompute every cell that depends on the one you just updated.
My initial algorithm did this naively, leading to lots of wasted computations. Rather than walking the inbound edges for a cell and computing their values, the algorithm was changed to walk the graph of dependent cells (that is, the cells that depend on the one that changed) in a breadth-first manner to produce a set of cells to recompute, preferring that order. You don’t need to get the order perfect; the goal is to avoid recomputing the same cell more than once.
Big asterisk here: we were, but not for spreadsheet formulas. What I was tasked with building is fodder for another blog post another time.