AI has gone through repeated cycles of euphoria and disappointment since its inception. AI itself has not been crisply defined, roughly encompassing tasks that could previously only be performed by humans, done by machines, regardless of techniques. However, tasks once considered AI tend to lose the label once they become reliable and commoditized, e.g., OCR or classic search-based chess engines.

At a high level, AI includes two broad genres of approaches: one largely deterministic and algorithmic; the other stochastic. The first encompasses for example, classic expert systems and search-based engines to solve board games like chess. The latter, at their core, revolve around building statistical models from data. One particularly notable aspect was that stochastic models were both hugely profitable (e.g., modern ad serving) but also a far cry from AI’s original aspirations, after all, no human can meaningfully reason about such probabilities at scale. All of this changed in the recent AI wave, as ML-based stochastic models found firm ground across many real-world use cases, driven by advances in large neural networks. We are now in another phase of AI euphoria. What changed this time around?

When we think about machine learning1, it is often framed as approximating a complex function from input to output. A typical ML training setup requires the availability of well-labeled input and output data pairs, since machines need examples to learn. This also unintentionally limits such systems, as it is often extremely expensive to curate large amounts of well-labeled data. From this angle, modern recommender system like ads serving is the few places where we have abundance of labeled training data, allowing the construction of every more sophisticated models.

The recent AI renaissance is a perfect storm driven by the explosion in data volume, model size, and the availability of flexible model frameworks like TensorFlow and PyTorch, along with vast amounts of computational power. Deep learning frameworks provided the necessary environment to program more complex model architecture, which sped up evolution of those models, improved upon training stability of the ever larger models. Larger models required ever more data to train on, and ever more computational power, which was provided by the ever more powerful accelerators like GPUs.

The availability of labeled data became a bottleneck. Objectives such as next-token prediction or diffusion-based training unlocked access to massive amounts of previously unlabeled data, including the text and images freely available on the web. These models are still statistical at their core, but something qualitatively changed in how they are trained and used at inference time. They became task independent. In a limited but important sense, these foundation models are models of the world. And once we have a statistical model of the world, we can then adapt it to new tasks. These world models also appear to exhibit a wide range of emergent capabilities, sometimes perceived as almost magical. This has given rise to zero-shot learning, prompt engineering, and a seemingly endless range of applications built on top of these models. However, it is important to remember that these systems retain a fundamentally stochastic core.

I take no position on how close we are to AGI (Artificial General Intelligence), as that would require answering what human intelligence actually is, a question far beyond the scope of this discussion. The concrete theme of the next steps for such world models is grounding them within specific domains. This can take two forms: either creating a deterministic shell around a powerful stochastic machine, or applying it directly in domains that are themselves stochastic in nature.

The first category is predominantly coding, at least initially, as it is comparatively easy to ground. We are at the beginning of a seismic change in coding as a profession. To a large degree, this is very similar to past abstraction changes in software development, from machine code to assembly, to system programming languages like C, to languages with automatic garbage collection, and to more descriptive languages in data processing (SQL) and web technologies (HTML, CSS and JavaScript). Those building blocks at lower layer of technical stack won’t disappear, but they will become far less economically sustainable in application domain. Many business applications will be built primarily using AI-assisted or AI-driven approaches; while these systems may be less efficient in execution, they will be significantly more economical to develop.

The second category covers much of today’s white-collar work. Within this category, there are two subcategories: domains that are inherently stochastic, often described as creative, and that tolerate loose error bounds. Beyond ads and recommendations, think recruiting, marketing, first-level customer service, design, and similar functions. In these domains, even exposing the stochastic core directly is likely to work well, making them among the first to be disrupted. The second type involves domains with more deterministic rules and consequences. Disrupting these domains will take longer and will likely require substantial grounding, in the form of a deterministic outer shell. As with most technological shifts, domains with higher profit margins will attract attention first.

What are the implications for today’s SaaS companies? They can apply the above framing and start investing in AI to mitigate the risk of disruption. Incumbents still retain the advantage of deep domain understanding, but if a domain is profitable enough, a competitor will eventually emerge that approaches the problem very differently in an AI-first fashion. The relevant question is not whether to adopt AI, but whether any durable moat remains once development costs collapse and profitable domains inevitably attract AI-first competitors.

And one more thing, just like the advances in faster CPU in the early 2000s was consumed by all the JavaScript and CSS of the world wide web, LLM tokens are going to chew up GPUs for breakfast! There are at least two kinds of token usages, the one using LLM to write code and the one to directly use LLM to solve business problems. The first is much more scalable than the second and probably easier to ground.

  1. here we skip the whole area of traditional unsupervised learning setup, I found those classic approaches there less interesting. ↩︎