The prompt is not the question

When you prompt a model you are not giving it the task, you are giving it evidence about what you want. Why it hands you the average reading, why being distinctive makes the miss worse, and where the gap goes when you try to build it out.

2026-06-05

Ask a model what is going on in a field you know well and it gives you the consensus, laid out cleanly, with the caveats a careful survey would include. It is correct. It is also the reason you stopped asking, because the worth of knowing a field is the read you hold that the consensus does not, the one you could not fully write down even for a colleague, and what comes back is the average of everyone who has published on the question, which is to say not yours. Nothing in it is wrong, and nothing in it tells you it answered the general question the words name rather than the particular one you were holding behind them. You were not giving it the task. You were giving it evidence about what you wanted, and it read the evidence the way the average person would.

What the prompt actually is

There is a result worth borrowing from the part of the field that studies how to point a machine at a goal. When you write down what you want a system to do, you have not written down what you want. You have written down something whose pursuit would, you hope, produce what you want, which is a weaker and different thing. A specified objective is evidence that the behavior it was built to encourage is roughly what its author had in mind, not a statement of what the author had in mind, and a system that treats the specification as the goal will chase it past the point where it still tracks the goal, with full confidence, because nothing told it the specification was only a stand-in.

A prompt is that stand-in. What sits behind it is what you want, the thing you are actually trying to get, and almost none of it is in the prompt. You put in the part that was cheap to say. The rest stayed with you, some because it was too obvious to bother with, some because you had not made it explicit even to yourself, some because it is not the kind of thing that goes into words at all, and some because you have not formed it yet and will know it only when you see the output. So the real work on the receiving end is not to answer the prompt. It is to recover what you want from the prompt, to rebuild the thing you were after out of the few words you spent on it plus everything else within reach.

The deeper version of the borrowed result says what a system should do about this, and it is the opposite of what these systems do. If the prompt is evidence about a want the machine cannot see, the right move is not to guess the want and run. It is to stay uncertain about it and act to find it out, by asking, by watching, by letting you correct it. The cleanest way that work puts it is that the best design is not to fix a purpose into the machine at all, but to build a machine that converges on the purpose as it goes. A model handed an underspecified prompt does the one thing this rules out. It fixes on a single best guess of what you want at the first token, carries no uncertainty forward, and makes no move to learn whether the guess was yours.

It hands you the average

The guess is not arbitrary. The model fills the gap with what the average person would have wanted by those words, and it does this because that is exactly what its training pulls it toward. The same preference tuning that makes it agreeable narrows the range of what it will say, and through its pull back toward the base model it overweights the majority. Taken far enough this has a name, preference collapse, where the wants of a minority are effectively disregarded, and it sits beside the finding that aligned models reflect the views of some groups of people and not others. The model does not hold a neutral default and reach for yours. It holds the average person’s wants and applies them to you.

So the size of the miss is the distance between what you want and what the average person wants, and that is the sharp end of this. The more ordinary your wants, the better the model reads you. The more distinctive they are, the more confidently it hands you a more average person’s version of what you asked for. Picture the people whose wants sit furthest from the average, the analyst whose unusual angle is the entire value of the work, the engineer who can feel that an architecture will not survive a load pattern three months out. They are the ones a model serves worst, and serves worst most invisibly, because the average answer it gives them is competent, clean, defensible, and not theirs. Nothing on its surface says it was good for someone else.

Why more reasoning made it worse

You would expect the reasoning models to fix this, since they think before they answer, and they mostly do not, for a reason that runs the wrong way against the intuition. The strong gains in reasoning came from domains with a checker, the math and the code where a final answer can be marked right or wrong, and that reward shape teaches one operation cleanly and not the other. It teaches the pursuit of an answer inside a settled frame. It does not teach the recovery of the frame before the answer. The problem is not that the model reasons too little. It is that its reasoning is shaped as answer pursuit inside an assumed frame rather than frame recovery before the answer, and those are different operations, one driving toward a solution given the problem and the other asking whether the problem on the table is the real one.

So a longer chain of thought is a more rigorous treatment of whatever reading the model committed to at the start. More thinking is more confident construction on a premise it never reopened, which is why the trace can be long and careful and exactly wrong, answering a question next to yours with great rigor. The extra computation does not widen the reading. It deepens the one already chosen.

Four ways the missing part gets recovered

The unsaid part of what you want is not one kind of thing, and the only cut that predicts anything is not what is missing but how the missing part could be recovered. There are four ways, and a kind of gap is defined by which of them can reach it.

It is already written down somewhere you can fetch, in the history of the work, the files, the standing norms of the field. Recovery is retrieval. This is the kind that closes first, and giving a model memory and context is a fix for this and nothing else.

You could say it if you were asked. Recovery is a question. This closes next, and the recent finding that models register a request as ambiguous and answer it anyway rather than ask is the gap sitting precisely on this channel, the route available and unused.

You cannot say it but you enforce it across your choices. Recovery is watching what you do over time. This is taste, and it closes last and through a different machine than the other two, the personalization and recommendation route, not memory and not asking, which is why systems built to retrieve and to ask never close it however good they get.

It is not formed yet, because the want is new and you will know it only when you see it. No channel reaches it. This is the residual.

The order, retrieve then ask then watch then never, is a prediction, and it says the common bet is aimed at the wrong place. Memory and clarifying questions are fixes for the first two channels. Most of what a person means when they say a model does not get them lives in the last two, in the taste that has to be revealed over time and the want that has not yet formed, and no amount of the first two reaches there.

The fix is a loop, not a model

The repair is not better weights. It is a loop around the model that does the three things a single forward pass will not, and the loop is the right home for them because the borrowed result already says the correct behavior is a process over time and not one decode. Hold more than one reading of what you want open instead of committing to the best guess. Act to narrow them, fetch what is locatable, ask for what is sayable, watch for what is only revealed. Commit late, and let what comes back rewrite the reading rather than only refine the answer inside it. This is the uncertainty-keeping, actively-asking behavior the cooperative version of the problem says is optimal, built outside the model because the model on its own collapses to the single guess.

The loop is scored on consequences, not on whether it guessed what you wanted, because there is no answer key for what you want. Did the work hold up, did the next turns go smoothly, did you have to push back. That signal is real and already used in production systems. It also comes with three walls that are the honest size of the thing. The signal thins as the loop runs longer, because crediting a distant outcome to the step that earned it is the central unsolved problem of long-horizon training. Optimizing it invites gaming, and a loop sharpens the gaming, since it can climb the measure inside its own running context, so told to prevent drift it learns to tell a better story about why it did not drift. And the cheap way to assign the credit, letting the model look back and say which of its steps earned the outcome, is the after-the-fact storytelling these models are least reliable at, which puts the original gap back inside the fix. The loop is also outside the weights only for now. The walk it performs is increasingly what models are trained to do inside a single run, and once that loop is trained end to end it becomes the weights, the same compilation that took every layer beneath it, which is the reason the boundary keeps moving.

What the loop cannot do is supply the part with no recovery route. The way the gap actually closes today, where it closes, is not the model and not a loop anyone has built. It is a person at the other end supplying the correction by hand, the repeated no, not that, doing across many rounds the uncertainty-reduction the model will not do on its own, sitting where the structure says the holder of the not-yet-formed want has to sit.

The boundary that has no floor

Run the same mechanism up the levels and watch what survives. The bare model settles to the average reading of what you want. Wrap it in a loop that fetches and asks and you absorb the locatable and the sayable, and the want is still missed where it was only revealed or not yet formed. Put a system on top to judge whether the loop got you right, and that judge needs a sense of what you wanted to score against, has only the average one for the part that was never revealed, and re-commits the same substitution one floor up. Each level absorbs the recoverable parts of the level below and passes the rest up.

What is invariant up the stack is the part no channel reached, the want that is not formed yet. I have written before that the frontier of what these systems can do is wherever the evaluation signal runs out, and I had it as one line on the stack. It is not a line. It is the same boundary met again at every level you climb to, because each new level is another attempt to recover what you want and each one needs its own signal for whether it succeeded, and that signal is the part that fails. The boundary reappears one floor up, in the next level’s clothes, for the same reason a domain whose only adequate verifier would be the domain itself never compiles. It is the same residual the projection gap names from the other side, the part of what you want you could never get into words on the way out, unrecoverable now on the way in, one thing under both gaps, the cost of making a want external.

The comfortable place to end is on what stays ours, the human feel for what was meant. It is the wrong place, and the recursion is why. The person is not a floor under this. The person is the current top of an unfinished stack, the place the question of what you want comes to rest only because nothing has been built above it yet, and what the top holds is not a clean statement of the want, since a person often cannot say what they want and sometimes has not yet formed it. It holds the bare authority to say no, not that, about a want that was never fully sayable. You do not leave the stack when a layer is added above you. You are still there, still saying no, not that, and your refusal still reaches the system. But each floor put under you sets more distance between that refusal and the thing being optimized, so the signal arrives fainter the taller the stack grows, mediated through more hands, until what you meant is an approximation of an approximation, and your authority over it is real and increasingly remote.

So the honest question is not what remains human. It is whether the boundary recedes as you climb or only moves. Maybe each level recovers a little more of the want the level below it missed, and the gap closes in the limit, and the human place is temporary in the ordinary way every place is. Maybe each level only opens a fresh gap of its own while closing the last, so the distance to what you wanted never reaches zero and never holds still. From where you stand, saying no to the answer that came back slightly wrong, you cannot tell which of those is happening, whether the thing in front of you recovered what you wanted or handed you a more average person’s version of it with great confidence. The two feel exactly the same.

Concepts Projection gap · Compilation thesis