On Nadella's token capital
The clearest case for the AI-era firm, and the floor it stops one short of.
Satya Nadella has given the clearest executive version of the AI-firm thesis, and he gave it on Reid Hoffman’s podcast the week after Build. Every company, he says, now has to build what he calls a hill-climbing machine. You point it at an outcome you can score. You feed it your own data as the context it works in, you let people and agents do the work inside it, and you keep the traces of how the work got done, training the weights you own on them. The firm’s particular way of operating compounds into an asset he sets beside human capital and calls token capital, instead of leaking out the door. I think he is right about the machine. I also think the machine faces the wrong way, and the place it stops is not where a skeptic stops. It is one floor up, and the floor it stops short of is the only one I have ever cared about.
What the machine is doing has a description he gives in passing and then builds past. He says the tacit knowledge of a firm is the unique way it operates, passes judgment, and has taste, and that the model can draw that knowledge out of the trajectories of people working and encode it into weights. So the machine is a loop that captures a firm’s tacit knowledge by watching it get used, and the leak he most wants to stop, that same knowledge walking out through the training environments the model companies staff with your former employees, is the thing the loop is built to prevent. But capturing how a firm works by watching it work only ever catches the part that has already been worked. A trajectory is what the knowledge leaves behind. It is not the thing that produced it. The asset he tells every firm to compound is, for that reason, a record of judgment the firm has already exercised.
He earns the proposal, because the specifics are right in a way that is rare for a chief executive on this subject. The hill-climbing machine is a loop around a model and not a better model, which is the conclusion I reached coming at it from how a prompt fails. The eval is the new IP and the rest is mechanical, which is a sharper way of saying the frontier sits wherever the evaluation signal runs out. And the strongest defense of his whole picture is one I have made myself, that against a model that is commoditizing, the substrate around it captures the value, the deployment patterns, the evaluation frameworks, the feedback loops that turn a benchmark number into operational change. I argued exactly that in the substrate inversion, and I still believe it. So let me concede it at the front. The substrate captures the value. The question is whether what it captures is a durable edge, and the answer turns on what a captured trajectory contains, and on what the act of capturing does to the people it captures.
What the loop catches
A trajectory is a record of choices made, and most of what a firm knows how to do can be recovered from enough of them. The operational knowledge, the way this firm files a claim or stages a rollout or routes an exception, is articulable, it repeats, and it leaves a clean trace, and a loop trained on those traces captures it well. This is the part the substrate does book, and it is real. But inside what Nadella calls tacit knowledge there is a second thing that does not behave like the first. The worth of a hard judgment was never in the choice that got recorded, it was in the structure that made the right choice reachable in a space too large to search. I have argued that this structure, the arrangement of what you already know that puts a good move within a step when from outside it looked like a leap, is what taste actually is, the generative core beneath the judging. Record a trajectory and train on it and you get the choice, not the structure that made it choosable. You capture the operational knowledge. You do not capture the core that produced it.
It would be too quick to say the core leaves no trace at all, and the essay does not need the overclaim. An angle one analyst brings, if it is the kind that recurs across hundreds of cases, does leave a trace in the aggregate, and the loop learns the recurrence, and that is real value a firm should want to hold. The part that leaves nothing is narrower, and it is the part that matters. It is the move with no precedent, the judgment that is not a rare instance of a known pattern but the first instance of a new one, the read that holds because it reached somewhere the firm’s own history had not been. That move produces no trace until after it has fired, and one firing is not a pattern. The machine that beat the best players at a game studied for thousands of years reached a move human play rated at one in ten thousand,1 but it reached it because the move was rare inside a space the rules already scored, not because it was first in a space no one had measured. The recurring-rare, the loop will eventually learn. The first of its kind it cannot, because there is nothing yet to learn from.
So the part it cannot catch is the part that was the source of the edge. What made a firm’s judgment worth more than its competitor’s was not the operational knowledge, which the competitor can build its own version of. It was the capacity to reach the judgment with no precedent, the architect who can feel that a design will not survive a load pattern three months out, before any trace of that failure exists to train on. I have written that a map of taste, however dense, can only tell you where taste already is, and that the thing we keep calling the frontier is the first person to be somewhere else. A firm’s token capital is that map, built from the firm’s own traces, with the same horizon. It is exquisite about where the firm’s judgment has been. It is silent about the first time that judgment is somewhere new.
A lagging encoder
Will the loop not eventually pass the people it watches, the way that machine passed the players. It can, and the condition is what decides it. Where a judge is cheap, where a position can be scored for free and at once, a loop can play itself past the human distribution. The generative core has no such judge. Its measure is contested, arrives late, and is settled mostly in retrospect, which is the description of most work that carries real weight. The work on self-improving systems is fairly clear that, without a cheap external judge, such a loop is imitation toward a target and inherits the target’s ceiling, bounded by what the people in it already know.2
That bound is contingent, not a law, and the contingent version is sharper than the slogan that machines cannot do taste. It rests on the judge being expensive, and verifying a result is often far cheaper than producing it, the way checking a proof is easier than finding it.3 Wherever that gap holds you can build the judge, and the loop will climb it, and here is the part to concede plainly. The climb does expand what it reaches. Trained against a checkable reward it can reach moves it could not reach before.4 But expanding what you can reach inside a measure is not the same as setting the measure, and the loop only ever does the first. Decompose the structure of a judgment into criteria a machine can check, which I have argued is the whole game, and the bound lifts in that domain, and the loop passes the people there too. The honest limit is that the gap is not guaranteed. What is cheap to verify and what is cheap to produce are set by different conditions that need not line up, and some domains have no adequate judge short of running the domain itself.
So the bound lifts domain by domain, and what sets the pace is the thing to see. Each lift needs a measure made binding first, a definition of better that someone has to commit to before a judge can be built from it. Whether a person authoring a measure is doing something a machine cannot do in kind, or only relocating a measure handed to them by their culture a little faster than the loop can relocate its own, I do not know, and I have said before that I cannot settle it. The firm does not need it settled. It needs only the order of operations, and the order does not change. The measure becomes binding, then the judge gets built, then the loop climbs. The loop ratifies after the fact. It is always one step behind the commitment that made its target, and there is no rate at which a follower catches a thing defined by going first.
Novelty is a measure too
You can tell the loop to value novelty itself, and it is worth seeing why that does not escape the order. Novelty is not self-defining. Novel against which archive, interesting to whom, surprising under whose model of what was expected. The moment those are answered, novelty has become another supplied measure, and the loop climbs it like any other. The field that has tried hardest to build systems that strike out on their own found exactly this, that it had to hand them an observer of what counts as interesting, drawn from what people have already found interesting, and that the systems game the observer the moment it is fixed and have to be topped up from outside.5 Rewarding novelty does not author a standard. It relocates a supplied one a rung up, and at the higher rung it was still handed in. The high taste Nadella says you, and only you, can define, the phrase he lets fall while describing how to write a good eval, is precisely the part that cannot be turned into a reward the loop climbs, because it is the thing the reward would have to be made from.
The follower
The last defense is that the people keep working, so the loop keeps learning, so it stays current. It does, and the staying-current does not fail the way a loop trained only on its own output would, because real people feed it real traces every day, which is exactly what keeps that kind of decay off.6 But staying current is tracking, and tracking is following. The loop learns the new judgment after a person has exercised it and left the trace. This is why the human in the loop is not a transitional figure on the way to being absorbed, the way the hope for steerability can make it sound. Steerability is the good name for closing the channels that can be closed, the part of a want you would say if asked and the part you reveal across your choices over time. It does not reach the part not yet formed, the one known only when it is seen, because there is nothing there to steer toward. A more steerable model is one that has encoded more of you into the part that lags. The human is the nonbankable source of the move that has not yet become a trace, and the machine cannot become that source, because the act that would make it one leaves nothing to learn from until it is already done.
The coupling eats the source
Here the proposal stops being merely incomplete and starts to work against itself, and it is the part he is most pleased by, the interplay, the two capitals compounding together. Follow the loop around. Human judgment makes the traces. Token capital encodes them. And token capital reshapes how the humans work next, through the canvas and the agents and the inbox of delegations, so the next traces are shaped by the token capital that came from the old ones. The interplay he celebrates is a loop with both capitals inside it, and a loop like that does not merely fail to capture the generative core. It can wear it down.
The wearing-down has a mechanism, and he named it without seeing what it does. Cognitive coverage, the quiz a colleague built so that after an agent does the work a person can form a deductive grasp of what it did, keeps the human’s evaluative judgment current. It does not keep the generative kind. The two are different, and I have argued the difference, that review trains the judgment to tell when something is wrong while building trains it to know what right looks like, and that the capacity to author is built by doing the work, the years of translating intent into artifacts badly and then well, which is the work the loop now does. The optimist’s reply is that offloading the rote frees people for more of the frontier work, not less, and inside a single career, for a while, that can be true. It is not true of the pipeline. The frontier judgment of the senior was built out of the rote work of the junior, and when the rote is gone the senior is still here but the next one is not being made. The firm gets better at encoding judgment while getting worse at producing it. The pipeline that turned junior repetition into senior judgment was already thinning before the loop arrived. The loop does not merely inherit the shortage. It deepens it, by making the daily work of the firm the thing that no longer builds the capacity the firm cannot buy.
What you can’t bank
This is where the moat resolves, against the intuition. Grant that an incumbent with a deeper history extracts more of its own past, that the firm with twenty years of traces builds a richer token capital than the firm with two. What it has extracted is the past. And the more of it you hold and the better you encode it, the stronger the pull to run the next decision off the record of the last one, to let the encoded past stand in for the judgment. A firm can resist that. It can downweight old traces, treat the token capital as a tool and not a master. But the resistance runs against the grain of the asset, because the whole point of the asset was to compound, and what compounds is the past. Deeper history is not a heavier anchor by law. It is a stronger temptation to drop one, the old competency trap moved down from products and business models to the level of judgment itself.
So the two things the substrate inversion ran together come apart. The substrate is where a firm captures value against a commoditizing model, which is true and worth a great deal. It is not the same as a durable edge, because what it compounds is a lagging encoding of a frontier the firm’s people set, and the edge was always the going-first, not the encoding of where the firm went. The going-first cannot be banked, because the instant it leaves a trace it is the past, and the trace is the only thing the loop can hold. A firm cannot bank its own edge. It walks out the door every evening and comes back in the morning, or it leaves for a competitor, or it retires, and the loop keeps a faithful record of everything it did up to the day it stopped being the source.
I should say where this is weakest, because the weakness is the boundary of the claim. Where a field moves slowly, where last year’s judgment is still most of this year’s right answer, the lagging encoder is good enough, and the token capital is a fine moat, and none of the above bites. The claim is strongest exactly where the frontier moves fast, where this year’s right answer is one last year’s standing could not have reached, and that is precisely where Nadella’s own urgency keeps putting us. He cannot have both. If the world is turning over as fast as he says, the asset he tells everyone to compound is depreciating as fast as it accrues.
Faces backward
He ends where the optimists end, on the long history and the dream, the virtuous cycle that built the modern world and the hope that the whole of it now compounds at ten percent a year. I find the dream moving, and I notice only that it is a climb toward a measure, growth, that nothing in it authored, and that the human he keeps at its center is not the figure on the way to being absorbed but the one the absorption depends on and cannot reach.
The machine he describes is real, and it will keep getting better, and that is the trap, because what it will keep getting better at is being the firm it has already been. A firm that compounds its token capital hardest builds the most faithful possible model of its own past judgment, and will feel that fidelity as progress, and mistake getting better at being its past self for getting better. The one thing it can never bank, the capacity to be somewhere its own record has never been, is the only thing that was ever the edge. Token capital faces backward. The frontier was never in the record. It was in whoever leaves it.
Footnotes
-
AlphaGo vs Lee Sedol, game two, move 37 (March 2016). The roughly one-in-ten-thousand figure is David Silver’s, stated in the documentary AlphaGo (2017) and on DeepMind’s site; it does not appear in the Nature paper (Silver et al., 2016), which describes the policy network but not this per-move estimate. ↩
-
Representative, not a single result. Audrey Huang et al., “Self-Improvement in Language Models: The Sharpening Mechanism,” arXiv:2412.01951 (2024), formalizes self-improvement as concentrating probability mass the model can already generate, and proves it cannot create information not already in the base model. The same ceiling is implicit in Yue et al., arXiv:2504.13837. ↩
-
The formal version is the verification-generation asymmetry behind P versus NP: a candidate solution’s certificate is checkable in polynomial time even when finding it is hard. See the Clay Mathematics Institute statement of the problem (Cook). This is a complexity-theory result, not an empirical claim about models. ↩
-
A live, unresolved debate. Yue et al., “Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?”, arXiv:2504.13837 (2025), finds RLVR sharpens sampling within the base model’s reach. ProRL, arXiv:2505.24864 (2025), finds that with enough training it solves problems the base model cannot reach under any sampling. Cited as a debate, not a settled finding. ↩
-
Edward Hughes et al., “Open-Endedness is Essential for Artificial Superhuman Intelligence,” arXiv:2406.04268 (2024): interestingness is the observer’s chosen loss function, learned from human data, and an open-ended system must keep bringing the observer along. The gameability of a fixed novelty signal is older, from Stanley and Lehman’s novelty search (Why Greatness Cannot Be Planned, 2015). ↩
-
Ilia Shumailov et al., “The Curse of Recursion,” arXiv:2305.17493, published as “AI models collapse when trained on recursively generated data,” Nature 631 (2024): recursive training on generated data collapses the distribution. The counterweight is real data: Gerstgrasser et al., arXiv:2404.01413 (2024), find that accumulating real alongside synthetic data avoids collapse. ↩
Concepts Compilation thesis · Evaluative vs generative judgment · Token capital