⭐️ What We Talk About When We Talk About Performance

I was a manager before I joined Facebook, but not in the way I currently I understand the role. I mostly thought of it as an opportunity to amplify my impact through other people. If I thought about career development for my reports, it wasn’t my main focus. So, when I switched to management at Facebook in 2011, I had a lot to learn. And learning management skills changed me completely, really beginning my process of honest self-inquiry as an adult. Much of what I learned, I learned through the performance process. Accordingly, I was a strong advocate for it for most of my management career.

I didn't start as a manager at Facebook, and I participated in a few peer review cycles as an individual contributor. I liked writing feedback because I could take pot shots at my management and praise my awesome colleagues. It was helpful to summarize my contribution and reflect on how it could have been better. We used this tool called Rypple. It had some nifty features for approving reviewers and submitting feedback.

A couple months after submitting feedback, I got an evaluation from my manager and a performance rating, but it was hard to pay that too much mind. We were building one of the biggest software systems the world had ever seen, and we were making up the rules as we went along. In a single day, you could change any code anywhere, push it to production, and have more experiment samples than you knew what to do with before dinner. It was obvious that we were changing the world.

But coding in the Facebook codebase felt slow and complicated, and I worried that I was too old to compete with the new grads. So I converted to management, and I realized that the feedback-writing was just an input to a review process that didn't involve the ICs at all. It turned out that there was another side of Rypple: the manager view, which had workflows for reading, comparing, and delivering feedback.

However, before feedback could be delivered, it had to be calibrated with other managers. I had never taken part in a process like this, so I went into my first calibration meeting with no idea what to expect. I remember it so clearly because I was almost overwhelmed by the feeling of solidarity and mutual understanding that it generated. This was the first forum I’d ever been in where I heard managers openly praise and criticize the ICs, and I realized that some of the things I valued in engineers and some things I wanted to change about Facebook, other managers did too.

In particular, I noted that the things that we recognized and discussed in calibration were unglamorous but super-important projects like migrations, observability, and infrastructure investments. I was happy to see that some of the employees who were most vocally critical of management were also some of the most valued. Hearing how much the managers appreciated the people who truly improved things was a refreshing reminder that we were all on the same team and wanted the same things.

And it was so helpful to have these discussions that it was easy to overlook the sometimes contentious arguments that would erupt over how to think about assessment. At this point, I think everyone agreed that the most important product of this process was the feedback from the manager to the report. Now, one outcome of this process was also to assign a formal performance rating and to make promotion decisions, but I swear that early on, that didn't seem like the point.

This is not to say that the process was always fun or that it was ever easy. Calibration is a mirror that helps managers see themselves more clearly. Seeing my mistakes and biases through a process of reflection was often painful. But I found these checkpoints to be transformative in terms of identifying my own weaknesses. I don't think it's overstatement to say that I've learned more about myself through performance calibration at work than through any other mode of self-inquiry. If you approach this process sincerely, you have the opportunity to see your mistakes and biases in painfully clear light. For a long time, the Facebook calibration process was all about managers helping each other to see clearly in this way.

Over the years we built this shared narrative about individual performance in terms of team effort; stories about how the words on the poster translated to real action and real outcomes. But history is written by the victors, and inasmuch as this process served the managers, it also laid bare the realpolitik of the otherwise utopian modern corporation. Sure, good managers “support” and “serve” their teams, but at the end of the day, they decide how to distribute the rewards, they decide who should be recognized for doing a good job, and most importantly, they get to tell the story.

And like all stories, the story contains some elision and artistic flourishes that make the narrative stronger. The official history is just one of the possible histories, which emphasizes specific events and specific factors. A single bad manager who twists this narrative to self-serving ends can wield enough power to wreak havoc. This was precisely why calibration was so important.

Maybe because it felt like such a privilege just to be part of the team, no one seemed to worry too much about the result of the performance process, beyond getting fired. And the feedback process was helping us improve as engineers, as managers, as people. Facebook was generous with its compensation and we knew we would all be rewarded for helping the company grow.

Calibration seemed to help us clarify our management principles. Take a team who has some problems, where, say one person plunged the toilet every day, and another worked on fixing the plumbing. I hope we agree that toilet plunging is the worst kind of engineering. But do we give more rewards to the person who undertook the comprehensive fix? How do we decide if they did it quickly enough? What was everyone else supposed to do in the meantime? Shit on the floor? These questions were parsed with the fineness of an appellate legal decision, and eventually a corpus of settled case law helped us move faster through these discussions.

In retrospect, though, I think all the hours we spent trying to apportion this kind of credit were a complete waste of time. Dividing the impact of these two hypothetical teammates is impossible because one enables the other; they are precisely working as a team. But the performance process we designed didn't really acknowledge that. And just being part of Facebook at that time felt like such a privilege that it didn't seem to matter so much. Sure, maybe you plunged the toilet for a couple a months and then got a disappointing rating, but it was easy enough to let that go and turn back to your work.

The performance process could be zero-sum and still even the losers felt like winners. When the management corps was small, we could have a consensus opinion about what constitutes “Impact,” even if we couldn't precisely define it. But the fun couldn't last, because scale ruins everything, and Facebook's astronomical growth eventually caught up to us.

It's stupid to pit teams against one another, but once most of the bugs were fixed and systems implemented, there were few pure optimizations left. Most improvements involved trade-offs, sometimes benefiting one part of the organization at the expense of another. It also meant that the most valuable changes routinely took much longer than the six month review cadence. Projects would have to be evaluated mid-flight, and impact wasn't always a useful measure.

Furthermore, these bigger changes necessarily required coordinated effort from multiple people on different teams, which entangled organizational forces with individual assessment. Combined, these factors allowed for a good engineer to do good work on a good project, but still get a bad assessment. Once this happened, the wheels came off. Frustrated ICs started asking for more clarity about what exactly they had to do to get a good rating. This may seem like a reasonable question, but trying to answer it is a path that leads to hell.

Let me illustrate this with the example of Facebook’s evolving expectations around interviewing. During the golden era, most of the engineers at Facebook did interviews, but not all of them did. I think of most of us kind of liked meeting new people, but some really didn't. So you could do however many interviews you wanted to, as long as you understood that we pretty much always need to increase interview capacity. Then it came up in calibration that this one engineer did like eight zillion interviews and another did zero, and we started to try to formalize our accounting for interview contribution in the performance process.

After years of Great Debate™, this particular dimension of job contribution surfaced as a histogram in the calibration tool, which generated literally years of calibration case law regarding interviewing. What about the engineer who made stellar contributions but didn't contribute to interviewing at all? What about the engineer who did double the average number of interviews? These questions can be answered, but for goodness sake, who cares? Everyone needs to recognize the value of recruiting and interviewing, but people play different roles at different times.

This is how, what started as a system for feedback, turns into a system for expectations. Instead of asking each individual to make a smart decision about how they trade-off interview time against other activities, we standardized the interview expectations. Instead of thinking about the outcome of the performance process as feedback with the benefit of hindsight, we started to think about it in terms of trying to manage the outcome.

king of the hill

It makes sense from the perspective of the individual contributor. "Pay for performance" makes it seem like my goal should be to get a high rating. If my manager is telling me that I will be assessed on factors that can’t be accurately described or predicted ahead of time, that seems like a cop out. So the managers and the leadership diligently set about trying to standardize the answers to questions that can't be formulated precisely. Eventually, the interview metrics moved out of the calibration tool and became a dashboard that everyone could use to track their own contribution.

But as soon as that became public, the engineers started to look for a formula. This is how you end up with the absurd field of study that is calibration math, which postulates that “exceeding expectations at level N is meeting expectations at level N+1” and provides detailed accounting rules for the expected impact of employees who are on leave. Almost no value is created by exploring this domain. Yet we are forced to, once everyone believes that the point of the process is to get a good rating.

On the other hand, how can a manager say, "don't worry so much about this process that garners so much attention; it’s more about us collectively than it is about you personally?” There is also a power imbalance here because the managers know what happens in calibration and, by and large, the ICs don’t.

This is all made much worse by the fact that these assessments are tied to compensation. As much as I like to emphasize the contemplative and narrative aspects of calibration, there is an undeniable outcome for each individual, and that outcome is quantified in dollars and shares. When it feels like you can directly contribute to the overall value of the company, it's easier to ignore these outcomes. But once individuals feel the organization is bigger than them, perhaps they naturally start to wonder about what they can do to improve their personal bottom line.

And the surprising thing about the seemingly worthy goal of “pay for performance” is that it is pointless, arduous, and destructive to try to assess individual performance. Value creation at a modern company is a team sport, and the best teams precisely don’t worry too much about credit or blame. It seems almost perverse that we would spend five months working as a team to create value and then almost a whole month of the performance cycle trying to figure out how to divvy up the credit.

Now, this is not to say that it's worthless to spend time reflecting. It’s critically important to understand the reasons and circumstances for our success and failure. It's just that attaching compensation outcomes to the discussion raises the stakes in an unhelpful way. It orients the whole process towards eliminating surprising outcomes. This makes the organization more risk-averse, short-sighted and dull.

Once the process becomes more about the ends than the means, it is irretrievably broken. If managers can engage with the process without asking themselves hard questions, they will learn nothing; they will only reinforce their existing weaknesses and blind spots. Once the emphasis shifts to outcomes, manager performance is all about that outcome, and they will reflexively try to defend it. And once a single manager decides to treat the calibration process as a game where they acquire and spend political capital, rather than as a group-based process of self-inquiry, they ruin it for everyone.

When I left Facebook in 2018, I timed it so that I would not have to go through another performance process—I was that sick of it. I knew that Robinhood had calibration, but I figured that since it was still a start-up, it would be more like the Facebook of the early days. To my surprise, it wasn’t. Robinhood had somehow imported the baroque-era Facebook calibration process, where everyone was already focused on individual outcomes. Compensation played a big role in how the ICs viewed the performance process at Robinhood, and there was a lot of backing-and-forthing about not just who was exceeding vs. meeting expectations, but how exactly the salaries were calculated. These concerns eventually arose at Facebook too, but at a much later stage and when there was a larger and more sophisticated people team that could turn convention into policy.

So, at Robinhood I learned that how the performance process unfolds is more a function of culture than of size, though those things are related. I think we made some progress in putting the emphasis of calibration at Robinhood where it belongs, but it's sad to me that the performance process that we exported from Facebook seems to generally be more of the late-stage variety. It makes me think that maybe the performance process as we practiced it during those early years at Facebook was the real anomaly. The manager corps had a chance to develop a consensus about what we were doing and how we were doing it without having to immediately explain it.

I did these performance cycles for almost a decade. I loved receiving, giving, and distilling feedback, but I hated the assessments. I thought maybe I would become inured to it, but instead the opposite happened. Each cycle, it was more and more painful for me to push ratings down (something that senior management has to do in just about every one of these systems) and increasingly distressing to watch so much engineering and management effort be channeled into comparative assessment, which I'm convinced is the least valuable part of this process.

I came away with the intention of never participating in such a process again, and am now a radicalized advocate of rethinking it entirely. I’m going to lay out some of my suggestions below, but I heard one objection to the very idea of experimentation in this domain that gave me pause. Should a tech company even bother to innovate in the area of performance management? I mean, sure, the process has problems, but at least it is established. Maybe whatever effort we expend on improving the performance process would be better channeled into work on the product or the systems. Is innovating in the area of performance management really a sensible differentiator?

I almost bought this argument, and maybe it’s a good reminder not to go too far afield with radical change to this system. But no, as managers, our whole purpose is to establish frameworks for process improvement. How could we claim to take our jobs seriously if we fail to develop and apply such a framework to ourselves? The fun and strain of working at a high-growth company is that we need to get better at things just to keep them running. In fact, I think the opposite holds: the work of skilled software engineers is so valuable that it will precisely be the premiere engineering teams of this era who figure out how to optimize it and ultimately write the book on 21st century corporate management.

And I think the first chapter of that book will be about the need to eliminate individual assessment. It’s too hard to discern the difference between engineers who are meeting the high expectations of their role and engineers who are exceeding them. The median outcome of the assessment process is giving “Meets Expectations” to an engineer who is doing great work. It's demoralizing and pointless. “Pay for performance” sounds good in theory, but in practice it’s just too hard to define and measure individual performance in a team setting. It also pits team members against each other. Fostering teamwork is the most important function of a manager. Why make our own jobs more difficult with this onerous process?

Now, unlike assessment, we can't eliminate leveling and promotion. As long as other companies have a career ladder, let alone public titles, this is an area where individual companies can’t afford to innovate too much. As a hiring manager, you need to be able to tell a prospective employee how their current level maps to your career ladder. As a candidate, I need to be reassured that when I leave your company, my role and title will correspond to other positions in the industry.

Given this constraint, it's tempting to design a performance process that eliminates assessment and centers around level and promotion. I think that would be better than what we have now, but I have a stronger suggestion: eliminate discretionary promotions and make levels entirely tenure based.

Don't design a promotion process. Stop with the career ladders and the promo packets. End these pointless discussions about exactly what increases with seniority. We all know that sometimes a senior engineer will copy the same wrong answer from Stack Overflow as anyone else. We also know that as engineers gain experience, they become more unique. While it may make sense to give all the new grad engineers the same guidance about how to level up, it makes almost no sense to do that for engineers with lifetimes of unique experiences and strengths. Allow candidates to negotiate for their level when they accept the offer, but from there, make "promotions" entirely based on tenure.

I want team members to feel like their reward for performance is the opportunity to participate. I don't want them to ever believe that they can succeed if their team fails. I think this suggests that we need to shift the focus from rewarding top performers to managing out under-performers. In fact, maybe it's time to revise our ideas about the modern employment agreement. No one is going to work for the same company for twenty years and retire with an engraved gold watch and a pension. In this fast moving employment market, why do we still act like it's such a big deal to let someone go?

In a decade of working closely with other managers I've seen that there are almost always chronic under-performers that should be managed out. I believe that doing this quickly does more for team performance than any amount of rewards for top performers ever could. Sure, it destabilizes the team when someone leaves. Perhaps we can improve this by making attrition more transparent and natural. We tend to regard attrition targets and employee turnover as a "dystopian nightmare," but I don't see why it has to be. Given that a successful enterprise must precisely transcend the individual, one way to make the enterprise more antifragile is to make it less dependent on individuals. This is a radical shift away from the kinds of heroics and 10x impact that we have conventionally celebrated in Silicon Valley, but I believe it is time for that to change. A truly enlightened team knows that its time will end, and still accelerates towards that end, whatever mystery lies beyond.