This post is a reflection on my Stanford LEAD class on financing innovation with Professor Peter DeMarzo

There have been discussions around the profitability of foundation model companies such as OpenAI and Anthropic. In reality, the top-line revenue generated from inference services has not yet covered the capital invested in training large models. By the time training costs are amortized and inference revenue begins to show a return, labs have typically already started training the next, larger model using data generated by the previous one. How can foundation model companies break this cycle?

In economic terms, people pay for the utility and value a product or service brings to their daily lives. Applied to AI, while enterprises pay for token consumption, end users (the actual consumers) are not paying for raw tokens but for the utility built on top of them. People pay for products that generate software for them (e.g., Claude Code, Codex). They pay for products that provide tailored search results that can be explored further (e.g., ChatGPT, Gemini). They pay for products that streamline routine legal work (e.g., Harvey AI).

This demand is not new. We have always paid for these utilities; they were simply delivered by humans at a much slower pace. If that holds, how did pre-AI companies become profitable? The key difference is that companies were not responsible for the cost of human education. In modern economies, individuals bear that cost themselves, often at significant expense for top institutions or specialized professions. Companies hire graduates with no obligation toward their education costs, while those hires remain responsible for repaying student debt. The burden of education sits with the individual. Foundation model companies, by contrast, are effectively funding continuous cohorts of “students” (models), while also being responsible for deploying them to generate economic value. This resembles government-led vocational training programs, where individuals are trained and later employed in strategic industries. When such programs are funded by taxpayers and ROI is often secondary, the question becomes: how can foundation model companies sustain an equivalent system for training models?

Finance POV: If we treat model training costs as Net Working Capital (NWC), with training costs exceeding $2.25B [1], this can be framed as purchasing inventory (i.e., model weights). If that inventory is “sold” over a three-month period, a foundation model company would need to generate $800M+ in monthly inference revenue just to recover the working capital; ignoring the cost of capital (risk-free rate plus risk premium). With Anthropic reporting an annual run rate of $30B, equivalent to roughly $2.5B in monthly revenue [2], it may appear that NWC is recovered, with surplus capital available for the next training cycle. Why is this hypothesis broken? Because (1) the reported top-line revenue is generated by a portfolio of models, not a single model; (2) each successive model is orders of magnitude more expensive to train; and (3) newer models cannibalize the revenue of older ones. With roughly five active models at the time of writing, and assuming a normal distribution of revenue, each model would generate about $500M over its three-month lifecycle, which is insufficient to recover its NWC.

So how can foundation model companies fund a model “education system” with exponentially rising costs? I see three possible paths:

R&D: draw a parallel with SpaceX, which reduced launch costs from ~$10,000 - $20,000/kg to roughly $2,500/kg through reusable rockets and autonomous recovery [3]. Foundation model companies need to find how to re-use model blocks from one training run to the next such that it improves model intelligence without adding cost to training. This modularity requires deep technical research into making “incremental intelligence” possible. In other words, treating model training as a K12 education system where intelligence is built up – just at a much faster pace.

Specialized Intelligence: users pay for outcomes, not tokens. That supports a hierarchy of models matched to task complexity. Therefore, there must exist smaller models (e.g. distilled versions) that can perform the same task, at a much lower cost. For example, in coding agents, a big model (i.e. a university grad) can be used for planning (since it may require general intelligence), but when it comes to implementation, it switches automatically to a smaller model (i.e. high-school grad). This approach would require companies to take a deeper look at the activations of their big models for specific tasks, and specialize models based on their observations. Architectures like Mixture of Experts (MoE) become candidates for refinement, since not all experts contribute equally across domains.

External funding: training becomes analogous to financing education. Just as financial institutions underwrite student loans, capital providers could fund specific training given certain guarantees about its employability and in exchange for expected downstream returns. This capital need not be purely monetary; infrastructure providers could extend in-kind financing via compute, storage, or networking credits. This paradigm requires foundation model companies to fundraise for specific model capabilities and prove the credibility of ROI. One mechanism is lifecycle-based value sharing. But what guarantees could a foundation model company offer? One idea is to always open-source their models after it passes a commercially-defined EOL (e.g. 2-3 years). This funding model is not totally new. In fact, DeepSeek is funded by a quantitative hedge fund (High-Flyer) with an EOL of zero years (open-source).

In summary, revenue accrues to utility, not tokens. Sustainability depends on reducing marginal training cost through reuse, aligning model size to task requirements, and shifting part of the financing burden off the balance sheet into structured, return-linked funding.

[1] https://x.com/VladBastion/status/1795152160558035367

[2] https://finance.yahoo.com/news/anthropic-tops-30-billion-run-221045473.html

[3] https://ntrs.nasa.gov/api/citations/20200001093/downloads/20200001093.pdf