Writing for the Next Model

The stronger the model, the more it needs the one who directs it

2026-06-11

One morning this June I reviewed the Korean translation of one of my essays. Six corrections before lunch, each one small, each one a no, not that. A word for deployment that read like troop movements. A pure Korean coinage where the industry uses the loan word. A literal rendering of an idiom no Korean engineer would ever say. None of it was wrong by any style guide, and none of it took a minute to fix. By evening the system I work in had done something I had not asked for. It proposed a rule. Industry terms stay in the industry’s register, even when a more native word exists. Six swatted flies, and it handed back the policy I had been applying for years across two languages and had never once stated.

It is a strange thing to read your own unwritten rule. Not because the machine was clever. Because you find out you had a rule.

Each new model is supposed to need me less. I work inside them every day, and each one needs me more. Not more of my hours. More of the part that can still be wrong. Three years ago the work was writing instructions, careful and exhaustive, because the machine did exactly what you said and nothing you meant. Now it drafts the translation, flags the two sentences where my English is genuinely ambiguous, and asks which meaning I want to carry. The first job it took from me. The second it handed back. The third did not exist until the machine was good enough to ask.

I keep waiting for the version of this technology that needs less of me, the one everyone keeps describing. What arrives instead, every time, is a version with better questions. The arguments about these tools are arguments about a level, what the machine can and cannot do. The thing worth arguing is the derivative. As of the middle of 2026 the models step up roughly every six months, and every step has asked more of me, not less, faster each time. That is the opposite of the story this era tells about itself.

Nothing I worked with in 2023 could have lifted a rule from six corrections. In 2023 I wrote paragraphs of instruction to buy a fraction of the steering one word buys now. The same six corrections would have produced six fixes and no rule. This year’s models lift the rule from six examples. I do not know what next year’s will need, only that every generation so far has needed fewer.

The verdicts keep shrinking too. Two days after the six corrections, reviewing the next essay, my longest answer was four words and most were one. By the end a single letter was enough, because the system was no longer reading the letter. It was reading the letter against everything I had already refused.

Watch what that does to the corrections themselves. A ruling is worth what the model reading it can extract, and the ruling does not change while the readers keep improving. The same entry is worth more every time a stronger model picks it up. The ledger is stock that appreciates. The verdicts feeding it are flow that cheapens. I do nothing. The reader of the record gets better. The record lets my judgment ride the machine’s curve instead of my own.

I do not make the rulings for the compounding. I make them because the work in front of me needs them. The compounding comes free.

Late in 2025, Tuhin Chakrabarty and his colleagues fine tuned a model on a novelist’s collected works for about $81, and expert readers preferred its excerpts to fresh writing from MFA-trained writers. The result was widely read as an ending. Look at the design. The researchers supplied the topic, the brief, and the sample passages to both sides, so the one thing the study never tested was who decides what to write. What $81 buys is a writer’s past. The ledger is the writer’s present. It moves because the person moves, and no snapshot stays current for longer than the person keeps deciding.

Every ruling is a call option on every future model.

Why a stronger model needs you more

That sentence is a bet about the future, so here is the machinery under it. Models improve on what populations can verify. Code that runs, proofs that check, answers millions would accept. The six corrections that morning are not in that set. No population could have made them; each had exactly one qualified judge. Where your work should go is verifiable by no population and it exists in no corpus until you supply one, which is why a stronger model gets better at reconstructing your direction from less and less signal and never begins to author it. The reach rides the curve. The choosing, on its own, has nothing to ride on.

So think of everything a model could produce for you as a space. Each generation the space gets larger, faster than the model gets better at knowing which point in it you wanted. A weak model compressed every instruction toward the same modal output. Every prompt in that era came back as the same beige paragraph, and that era is where your intuition about these tools was trained. The lesson everyone took was that the machine levels its users. It leveled them because it was weak. A strong model under direction does the opposite. Left undirected it still pulls everyone toward the same mode, and the measurements that exist measure exactly that. Directed, it amplifies the difference between its users instead of erasing it.

Hand a strong model to two people now, one writing the instruction anyone would write, one who knows where the work should go and recognizes the right draft on sight for reasons she could not have specified in advance, and the outcomes no longer sit near each other. The same essay of mine, translated raw and translated under a few dozen recorded rulings, used to differ in flavor. The two versions now differ in kind. Nobody has measured this spread for writing, because writing has no benchmark with a ground truth, which is itself part of the story. The only evidence I can offer is my own.

The floor is rising too, and a good model now interviews some direction out of anyone, the way mine flagged the two ambiguous sentences. But what the interview recovers is the direction anyone could state, the modal want behind those words, while the set of drafts a direction must choose among keeps growing with each generation. So the distance between the draft anyone gets and the draft you wanted widens even as both get better. Variance sets the prize. Recognition collects it, and the recognition a model can learn from your record is yesterday’s. The ruling not yet in the record is the one act it cannot perform for you, because the test of rightness lives in you and has not happened yet.

There is no comfort in this. The territory you must see across in order to choose grows at the machine’s pace, not at yours. The working skill becomes bandwidth, how much you can take in, fold into what you already hold, and decide against. People have said for years that the scarce thing in this era is knowing what you want. Usually it is said as a mood. This is what sits underneath it. The premium and the pressure are the same fact. I can feel which side of it I am on, most mornings. The load grows at the machine’s pace. I do not. What carries the difference is the ledger.

Every part of the work with a public verifier is moving toward the machine. Fluency went first, years ago. Structure is going as I write this, in the middle of 2026. Style on a supplied brief turned out to be learnable from a corpus like anything else with a pattern in it. What cannot move is the part with no population, the choosing, and it is being priced accordingly. Call it the direction premium.

What writing becomes

Painters ran this experiment four hundred years ago. Rubens priced his canvases by how much was di sua mano, by his own hand, and Antwerp paid the premium not for the brushwork, which the studio supplied, but for the directing intelligence that knew which brushwork to keep. The hand was the studio’s. The painting was his.

Writing is heading the same way, faster, because its studio improves on a schedule. The center of gravity moves from composing to directing. The sentences will increasingly be the machine’s, the way the underpainting was the studio’s. What stays is the part Rubens kept. Which sentence survives. What the work is for.

If choosing across a widening space is the job, the inputs of choosing become the craft. Reading stops being preparation for writing and becomes most of it. What you have read and still hold is what the choosing runs on. Refusing precisely is the rarer half of directing.

Direction used to be learned as a side effect. Years of lower work forced small decisions, and the decisions, repeated, became judgment. The machine absorbs the lower work first, so the practice is dying exactly while demand for what it produced rises. That is why a premium forms instead of a skill everyone picks up. What survives of the apprenticeship is the stream. Not because recording is practice, but because deciding is, and a ledger forces the decision the lower work used to force implicitly, one explicit no at a time. A young writer’s best move this year is not to polish outputs. It is to start deciding on the record.

It already works this way in my own small shop. I built the room expecting a faster pair of hands. What I got was an instrument that reads me. My essays ship in three languages. The editor, the glossary keeper, and the quality gate for the translations are not people. They are a ledger of rulings, most of them weeks old, applied on every build. And the record reads back. I had three reading agents walk everything I have published and report what I find worth writing. I expected a summary. They came back with the claim my essays kept circling and never once stated, that the stronger the models get, the more valuable the one who directs them becomes. I had needed the machine to tell me what I had been writing about.

Where the premium lands

Swap the writer for anyone who directs these systems and the shape holds, unevenly. You can locate where it lands in your own work with one cut. Mark every part of your job where success can be verified by something other than a person’s judgment. Those parts close. Chess fell to exactly this cut, and the famous human-plus-engine window lasted only as long as it took engines to span a game that has a verifier. Every part of writing that admits a verifier will close the same way. Style on a supplied brief already fell, and the $81 paper was the receipt. The premium concentrates in the remainder, the places where the only verifier is a person.

The sharpest fight ahead is over who owns the stream. If your record of decisions accumulates inside one vendor’s product, the premium accrues to the vendor, and the capture will not look like a contract. It will look like a memory feature with no export. Practitioners will carry their ledgers between jobs the way they carry reputations now, employment contracts will notice, and the fight over ledger portability will look like the fight over the social graph, except this time the asset appreciates. Memory, as the products of mid-2026 build it, mostly means remembering facts about you. The memory worth paying for is the record of your refusals.

Expect the counterfeit. Prompting more is not directing. Volume of instruction is what direction looks like to someone who does not have any. The homework that matters now is judging fifty outputs rather than producing one, and almost nothing in education is built to grade a judgment.

Most direction is modal. The average sense of where to go is the average, which is what the machine already has, and a ledger of modal rulings teaches it nothing it did not hold. The premium is steep because the supply is thin, and it does not arrive by default.

Here is my own claim on it, written down. The rulings in my ledger are dated June 2026. They should be worth more under next year’s models than under this year’s, without my touching a single one. Worth more, meaning the same record settles more decisions on its own and draws fewer corrections from me, both of which the room already counts. If they are not, the premium was a mood.

A long enough record might make even the stream fittable. Maybe. A fit can predict a refusal. It cannot be the one who was wrong, and writing has always been signed by the one who could be.

The machine holds yesterday’s version of everyone. That is what a corpus is, what a fine tune is. Tomorrow’s refusal does not exist yet, for anyone, including you. Somewhere in your last month are six rulings you made and never wrote down. The next model will read them better than this one can.

Concepts Compilation thesis · Externalized taste · Spec gap