WhatsApp Culture
I have been fortunate to work for WhatsApp and have always wanted to write about its distinctive culture and the differences between it and its parent company, Facebook/Meta. In another post, I shall discuss Facebook culture, more precisely, the Facebook of then and the Meta of now. As you have probably guessed, the six years I worked at WhatsApp were my happiest days during my tenure at Facebook/Meta.
So, what do I remember most about WhatsApp? Their motto: Keep it Simple and The F word is Focus.
When I joined WhatsApp, it had recently moved to Facebook’s main campus in Menlo Park, building 10. Their recruiter sent a meeting invite with 10.2 2.10
as the location, which looked more like an IP address than a typical name for Facebook conference rooms, which consisted of random short phrases. As I later understood, the numeric naming schemes are organized sequentially, so if you are in room 2.6
, you know room 2.10
is four rooms down the hall. No frills, simple and effective.
The first thing I noticed in that building was how quiet everything was, you can say eerily so. There were big-lettered QUIET dauntingly written on steely pillars. Alright, those are serious folks, I thought. The interviews themselves were casual. Three of the four people I met were of some form of “Brian” (the founder, an early employee working on the server, and the recruiter), adding to the surreal experience. The other was Anton Lavrik, the hiring manager for their data infrastructure team. By team, it was a team of two, with one new grad, who, not long after I joined, quitted to work on his Ph.D. in programming language. So in reality, we are talking more of a team of one here. This was a norm rather than an exception in early-day WhatsApp. You are the sole owner of a few services, and with that comes great creative freedom and also responsibility, such as being on-call 24/7. This is a feature, not a bug! Later on, I realized that they have a technical stack that was organized to enable such a surprisingly effective organization. More on this later, but in short, Erlang
, their programming language of choice, enables an operation-centric mentality.
I don’t remember the discussions with the three other people now, but talking with Anton was interesting. We discussed programming languages and their properties, strengths, and weaknesses. In hindsight, I think he’s screening for language fundamentalists, as I’d call them. As it later turned out, first, there were some programming language fundamentalists in WhatsApp already, causing discords; second, you need certain intellectual curiosity to embrace the rather peculiar WhatsApp technical stacks, which are built on a programming language called Erlang
, a functional programming language with Prolog syntax, created in the 80s, that is organized around a novel programming model called Actors
, which worked beautifully well on multiprocessor machines, ironically before multi-processor/multi-core was a thing. When I joined WhatsApp, they ran the front-end servers on the beefiest machines they could find with hundreds of gigabytes of memory (circa 2017), and Erlang scales beautifully on such a big host, handling 1 million+ connections on a single host.
The interview went well enough. I decided to do a Hack-a-month program. In this program, you officially remain on your current team but work on a new team on some real projects for a month (give or take) to see if there is a mutual fit. Hack-a-Month, designed to facilitate internal mobility during the more carefree years of Facebook, felt normal in the culture of “Facebook Then”. Sadly, I can’t help but wonder if anyone would pursue it in the “Facebook Now”, famous for its muscular energy, and focus on short-term impacts.
My Hack-a-Month project was to create a small library that automatically throttles logging traffic while maintaining the ability to compute overall statistics correctly. Not long before, they had an incident where a benign server error cascaded due to the extra error logging triggered, creating a nasty positive feedback loop that brought down some services. As a functional programming aficionado, I found coding in Erlang to be pure bliss. There are not many keywords to remember, just pattern matching and recursion. The actor model offers an extremely simple way to manage concurrency: sequenced through mailboxes, after which, the program becomes embarrassingly single-threaded. So I coded up pretty quickly, probably in a week, including learning Erlang. Then I asked Anton how to deploy, to which he asked me to just hotload™. ¯_(ツ)_/¯, right!
That opened up many questions in my mind. I knew of Erlang’s capability to dynamically update code without the need to restart a service. Actually, in the early days of WhatsApp, services, along with base hosts, would run for months without being restarted. While fanciful, it felt so easy to mess up; I can envision millions of ways things could go wrong. My experience told me to ask for help, as I wasn’t comfortable that I understood both the mechanism of hotload and the potential blast radius when something went wrong. Anton agreed to sit down and explain the tools and mechanisms involved. And of course, as the story went, something did go wrong! While I didn’t bring down WhatsApp, I did cause scary logs spewing up in many services, and quick pings from people (the 24/7 operators!).
In hindsight, WhatsApp’s early employees expected people to read the user manuals (of Erlang and services) and be able to operate the services and make changes. Call that old-fashioned. Coming from Facebook-land (and typical chaotic startup world), my default reaction was: what documentation? Documents in those places are often incomplete, only directionally correct, and surely outdated. It is refreshing to see an organization that invests in writing key documents and keeps them updated. There’s also really nothing fancy, just plain old Markdown files in a repository, one for each feature. Remember, keep things simple!
I must digress a bit on the ability to hotload new code without a service restart. It is extremely powerful for several reasons. It makes deployment extremely fast without saying. Counterintuitively, it also has the potential to make deployment safer, as it does not perturb system dynamics: there’s no cache to be warmed up or connections to be reestablished afresh. Should any issue pop up, it is very easy to correlate it to the timing of a deployment. That is, if you are careful. The downside of it is that you need to organize your program to be compatible during the hotload mechanism when new code needs to work with both old and new states. If this sounded hard, it is not too hard in practice, as Erlang, being functional, only keeps memory state in particular places. The single-threaded Actor model offers a precise location to allow hooks to convert old state into new state. As a result, keeping backward compatibility often involves just additional function heads to pattern match both old and new states/data formats.
Another headwind is that typical cloud infrastructure is not created to handle such a fast turnaround time. For example, just think of your typical observability stack, which, for Meta’s systems, operates at a minute granularity level, which would not be sufficient for hot-loading-based rapid deployment. I believe it is pretty doable to use a hot-loading mechanism to create an extraordinarily safe and blazingly fast deployment system for Erlang-based systems. As a matter of fact, to save capacity during regular deployment by rolling restart, Instagram did make some form of hotloading in the Python land. I did try to make it happen when I led WhatsApp’s Core Infrastructure team, but I wasn’t able to for various reasons (priority, talent, etc.). By the time I left WhatsApp, we had largely regressed to the mean: just rolling restart deployment. It works, but it’s boring, so to speak.
There are more to talk about this Anton Lavrik person, my manager. Soon after joining the team, I realized the massive infrastructure he created on his own. WhatsApp’s data infrastructure handles telemetry collection and analytics solutions for the collected data. Not only did he create a particular simple data dialect to describe the data to be collected, and the SDK and mechanisms of collecting them on all WhatsApp clients and servers, he also created a specialized database that allows SQL-ish queries on such a database, SQL without joining semantics, similar to Scuba. I was very impressed, but also puzzled. Why not use Hive or the other open-source tools? His answer? He didn’t want to be a Hive admin, but wished to code! Plus, in the early 2010s, there weren’t any mature analytic databases supporting real-time analytical queries, which was the primary goal of his system: to help developers, less so data scientists.
That said, he was clear-eyed and practical. I was hired to eliminate this customized database and integrate it into Facebook’s big data infrastructure, and within a half-year, we repurposed its query engine to be a flexible multiplexer, redirecting and anonymizing different data to different Facebook analytic systems. This shift also coincided with the increase in the variety of data analytics tasks, which went well beyond the original design of a developer observability stack.
One of the significant differences between WhatsApp and Facebook, I noticed, was that there was no growth team at WhatsApp. The WhatsApp team was largely focusing on creating the most reliable chat system, without worrying too much about user growth. It’s not that user growth is not important for the business, but rather user growth took care of itself when WhatsApp focused on the basics of user experiences: keep it simple, keep it reliable. That was a really lovely place to be, a strong testament to the product-market fit. On the other hand, Facebook was famous for its growth hacking, using various ways to construct a flywheel of user and engagement growth, create and maintain a network effect. Later, I realized that it’s more about the different product domains, where messaging had been an established area for decades before WhatsApp; Facebook’s News Feed, on the other hand, had to be created anew. Before Facebook News Feeds, there were no News Feeds. They genuinely do not know what users might want, how they might want it, without trying it out in the wild! I also came to appreciate that different companies require different ethos to succeed. One of the key successes of Meta as a company was that it largely left its key acquisitions alone, with no mandates for unification of infrastructure or much overt push for unification of products, particularly early on. After years, Instagram is still largely Instagram, built on top of Python; WhatsApp is still largely WhatsApp, happily coding in Erlang.
The strong organic growth allowed WhatsApp to focus on product and engineering fundamentals. They took their time to create very robust features, ironing out small broken windows and edge cases patiently. Famously, the company took 1-2 years doing nothing just to create a truly end-to-end encrypted chat, to safeguard user privacy. I think they did that right around the Facebook acquisition, and after Snowden. When I say “truly”, it means it works across all devices, including the handicapped web browsers, which communicate with a primary device through some quite intriguing synchronization mechanisms. It means it works with all WhatsApp’s features, be it chat, group chat, status, VOIP calls, or even group VoIP calls. While it is very complex under the hood, thus taking 1+ years to do it right, this creates a straightforward contract with its users: what’s communicated on WhatsApp stays private. There are no caveats, there are no carve-outs. WhatsApp didn’t pick to work on what’s easy, they chose to work on what’s right and simple for its users. Such an emphasis on simplicity for the users shows up again later in the several years WhatsApp spent to do the multi-device right, without centralized storage for full history, which I believe is the best implemented of any chat apps, Signal, and iMessage included.
If we look at WhatsApp’s competitors, both inside and outside Facebook, such as Messenger, or the myriad Google messaging products, the difference becomes almost comical. Messenger spent years chasing fanfare features with the grandiose of them all: interoperability among all chat applications, and itself becoming a chat inbox app. Google kept creating and deprecating its various chat products so that no one could remember which was which. WhatsApp just kept its head down and focused on the basics. This really is a cautionary tale, reminding me of the invariant truth that startup wins often because incumbents mess up. Or to quote Dalton Caldwell: When a startup is competing against a large competitor … they are likely competing with some PM focused on internal politics/career progression (on X). In the end of the day, it’s on the leaders to set up the right incentives, to keep things real.
As I embark on a new journey in startup world, I wrote this down as a reminder to keep things real. Keep it Simple, and Stay Focused. Until the next time when I will talk about Facebook culture.