Compiled Away

A Map of What Gets Automated Next

Link to slide

In 2023, I built a system to write a novel nobody asked for. A multi-agent orchestration project, pointed at fiction as a stress test. The premise was recursive. A programmer uses AI to write a novel about a programmer who uses AI to write a novel. Nine layers deep. Each protagonist discovers, eventually, that they’re a character in someone else’s story, authored by the layer above them, authoring the layer below. Each one believes, for a while, that they’re the one holding the pen.

I published it. I thought the interesting part was the engineering, the challenge of making agents coordinate across nested narrative frames.

I was wrong about what I’d built.

It took me two years to understand. Every protagonist in that novel believed they were the origin of their story. Every one of them turned out to be the middle of someone else’s.

I thought I was the top layer. The one who writes the spec. The source. It took me two years to understand I might be somewhere in the middle.

I build AI systems for a living. I have for a decade. Machine learning, language models, the full stack from research to production. And what I’m seeing from inside the machine room is a structural overhaul of knowledge work that most people outside can’t see yet. Not because they’re not smart enough, but because they don’t have their hands on the mechanism.

This is what I can see from where I’m standing.

What I Saw in Ten Hours

I sat down one morning to build something I’d been thinking about for weeks. A service with a database, a few external API integrations, and a processing pipeline. The kind of thing that is never hard because of the algorithm and always hard because of the edges. I estimated it would take a month. Maybe longer.

I opened my editor, connected a coding agent, and started describing what I wanted.

By hour two, I wasn’t writing code. I was describing intent in natural language, in corrections to what the agent had already produced. The agent would generate a component. I’d catch the places where it had inferred something I hadn’t specified, and I’d re-specify. Not rewrite. Re-specify.

There were two distinct kinds of failure. The first was when the agent guessed wrong about something I could have told it. It wired the external calls as synchronous request-response where I wanted fire-and-forget with a retry queue, because I knew from experience that the downstream service would flake under load and blocking on it would cascade. That was a specification failure. I hadn’t said what I meant. Easy to fix once I caught it.

The second kind was stranger. I had a mental model of how the system’s components should relate to each other, where the boundaries needed breathing room for features I could already feel coming. I could state requirements for each module. What I could not transmit was the set of invariants that made the whole composition stable. Each correction improved one joint but disturbed the gestalt. The interface was the bottleneck, not my ability to specify and not the agent’s ability to execute.

One was a specification failure. The other was a projection failure. I didn’t have those words yet.

By hour five, I’d settled into a rhythm I didn’t recognize from a decade of engineering work. Specify, review, catch, re-specify. I was evaluating architectural decisions, not implementing them. The agent was doing the translation work, the part that would have occupied my hands and my screen for weeks. My job had become something else entirely. Making sure the translation was faithful.

By hour eight, I had a working system. Functional, tested, deployed. What came out of those ten hours would have been weeks of work for a skilled team. Not because the code was better. Because the translation from what I meant to what existed had happened without me performing it manually.

There’s a precise structural term for what the agent had been doing. It was compiling. Translating a representation at a higher level of abstraction into executable form at a lower level. Not metaphorically. Structurally. A compiler doesn’t just compress. It commits to choices, and once committed, alternatives are gone. My intent was rich. What came out was running code. The things that were lost (my reasons, my taste, my tacit knowledge of what would break in six months) were lost not because the agent was bad, but because the target representation couldn’t hold them.

Not translation, which implies equal altitude. Not compression, which implies reversibility. Not distillation, which implies the lost material was noise. Compilation. Directional, executable, irreversibly lossy. This is a claim about a specific class of work. Translating intent into artifacts that have to run in the world. Generation, creating something genuinely new, is a different operation.

The layer beneath me, the one where I’d spent years translating my own intent into code with my own hands, had vanished in a morning. And the fear of losing my layer was exactly what was propelling me into the layer above it.

I’ve been living in this state for months now. Every day I work with systems that are learning to do what I do, and the awareness pushes me to think at a level I’ve never operated at before. Not because I chose to grow. Because the ground moved. And what I’ve found up there is that I can see patterns I couldn’t see before, make connections I wouldn’t have made, produce work at a speed and level of abstraction that would have been impossible a year ago.

Every knowledge work role I can think of is a compilation step. Strategy into plans. Plans into tasks. Tasks into code. Judgment into decisions. Taste into specifications. AI is where the compilation is being automated. The structure was always there. The automation is new.

And AI is learning to perform each of these translations, starting where the Spec Gap is smallest.

Why It Starts at the Bottom

There’s a reason AI mastered code before it mastered strategy. And the reason tells you exactly what gets hit next.

When I direct an AI agent to build a feature, my intent is only partially specified. I know what I want, but I haven’t said all of it. The agent doesn’t stop and ask. It guesses. And here’s the thing that most people building with AI haven’t fully absorbed. Every AI output is a mixture of what you wanted and what the machine inferred you wanted, and the two are interleaved and unmarked. If you can’t label which parts came from your intent and which came from the model’s inference, you can’t reliably debug the output. You can only react to it.

I call this the Spec Gap, the distance between what you mean and what you’ve managed to specify. It’s the single variable that predicts where AI disrupts and where it doesn’t. Yet.

The Spec Gap has two layers. Some of what I failed to specify during that ten-hour session was tacit knowledge I could have articulated if forced. I knew what error handling pattern I preferred. I just hadn’t bothered to say it. This is the articulable Spec Gap, and it shrinks every time you’re forced to make your intent explicit. The other part is harder. Some of what I meant, I couldn’t have specified even with unlimited time. The felt sense that a particular architecture would age well. The instinct, built from a thousand past failures, that a certain approach would break under conditions I couldn’t name. This is the inarticulable Spec Gap, the portion of intent that resists externalization not because you’re lazy but because it was never the kind of thing that could be fully written down.

At the code level, the Spec Gap is small. AI compiles well enough here to erase the bottleneck for many teams. This layer is falling now. At the analysis level, the gap is moderate. You can mostly articulate what you want. AI handles it with light supervision. At the coordination level, it widens. Intent is distributed across people, half of it unspoken. At the strategic level, it’s enormous. Organizational intent is emergent, contradictory, shaped by forces no one has ever written down. This layer resists longest.

This isn’t a ranking of difficulty or skill. Writing robust code is hard. But it’s hard in a way that’s already formally specified, which is why AI compiles it first.

The gradient is a tendency, not a law. What actually predicts the sequence isn’t abstraction level per se. It’s three things. How much intent is already externalized into artifacts. How cheaply you can verify the output. And how expensive errors are. These drivers correlate with abstraction level, which is why the gradient roughly tracks bottom-to-top. But the drivers are what’s doing the real work.

Don’t build your plans around what the model can’t do today. Every limitation you see is a snapshot of a moving frontier. And the frontier moves faster than anyone outside can see, because solving the layer below reveals something disorienting. Much of what felt irreducibly human at the layer above was actually specifiable all along. We just hadn’t been forced to specify it. When code generation automated the bottom of the stack, suddenly I had to articulate architectural intent I’d previously left tacit. And it turned out I could articulate it, which meant the agent could compile it, which meant that layer was next.

The ceiling keeps turning out to be lower than it looked from underneath.

But the stack doesn’t stop at me. My organization compiles its strategy through people like me into executable work. Markets compile competitive pressure through organizations into strategic responses. I’ve always been an intermediate layer in a compilation chain that extends above me. I just couldn’t see the layers above as compilation because they felt like the world, like reality rather than another representation being translated.

I am not the source. I am the middle. And middle layers are what compilers optimize away.

The Failure Surfaces That Travel With the Frontier

Two specific failure surfaces are causing most of the mess right now. Better models will fix them at this level. Then the same failures will reappear at the next level up, and in every new vertical where AI arrives. Because these aren’t limitations of current models. They’re structural features of compilation itself.

The first is the Spec Gap. The second I felt during that ten-hour session before I could name it. The moments where the failure wasn’t that I hadn’t specified my intent, but that the channel between us couldn’t carry it.

Even when you specify exactly what you want, the output still can’t hold all of it. Consider taste in music. Your taste is conditional, shaped by memory, context, mood, associations no one else shares. The AI needs a vector: tempo, key, instrumentation, energy level. The dimensions of your experience that don’t map onto those parameters are lost. Through today’s interfaces, your 10/10 song tends to become everyone’s 7/10. Not because the AI failed, but because the channel between your internal state and the system’s input couldn’t carry what made the song yours.

I call this the Projection Gap. The bottleneck isn’t that the model is low-dimensional internally. Modern models are extraordinarily high-dimensional. The bottleneck is the interface. Even if the model could represent everything you mean, you still communicate through a narrow pipe. Text prompts, parameter sliders, reference examples, iterative feedback. The channel has fewer degrees of freedom than your cognitive state. The specify-review-correct rhythm from the coding session is the human trying to close both gaps at once, narrowing the spec and widening the channel through iteration. It helps. It doesn’t eliminate either bottleneck.

The industry treats both failures as one problem (”the AI didn’t do what I wanted”) and the conflation is quietly breaking evaluation everywhere it matters. You can only check whether the AI lost something if you know what the original was. But at high abstraction, the original was never fully specified. That’s the Spec Gap. And even a perfectly specified original wouldn’t survive the interface intact. That’s the Projection Gap. Put them together and you’re measuring distance from a point that doesn’t exist.

Now here’s the part that makes this a permanent structural condition rather than a growing pain.

Better models narrow both gaps at the coding level. Then the frontier moves up to system architecture, and the Spec Gap reappears wider. Solve that, and it moves to product strategy. Each time, the same two failure surfaces emerge at a higher level of abstraction, with wider gaps. And every time AI enters a new vertical, the same pattern appears fresh. The Spec Gap is whatever practitioners know but can’t articulate to a machine. The Projection Gap is whatever matters but can’t pass through the interface the system accepts.

The specific problems rotate. The structure is permanent.

Which means the most important unsolved problem in AI might not be better models alone. It might also be better specs, richer and higher-dimensional ways to represent what humans actually want. Models are getting better fast. But the bottleneck is also upstream, in the source. And it will stay upstream, because every time the compiler improves, the frontier moves to a level where the source is less specified than before.

What Breaks When the Middle Falls Out

I learned my judgment by doing the work that AI now handles. A decade of writing code that didn’t work. Debugging it. Learning what breaks under load. Years of translating other people’s intent into working systems, badly at first, then well enough that people trusted me with larger, more ambiguous problems. The path from execution to direction to judgment is how I became capable of the work I do now.

That path is being compiled away.

Consider what’s already happening in engineering organizations. A junior developer used to own a bounded feature end-to-end. Write it, debug it, ship it, learn from what broke. Now the agent generates the feature. The junior becomes a reviewer of output they lack the context to evaluate. They can’t distinguish the agent’s inference from the original intent, because they never built the muscle to generate the intent themselves. That’s the unmarked-mixture problem from the Spec Gap, now operating as a training failure. The organization sees faster output. What it doesn’t see is that nobody is developing the judgment the senior people relied on.

This is not unique to engineering. Every industry has a version of the same pathway. Junior analyst becomes senior analyst becomes director. Associate attorney becomes partner. Resident becomes attending physician. The pipeline works because the middle layers, the years spent translating strategy into tasks and judgment into execution, are where you develop the judgment for the layers above. You don’t learn to direct by studying direction. You learn it by doing the translation work, encountering the failures, building the taste for what survives compilation and what doesn’t.

When AI compiles away those middle layers, it doesn’t just displace the people working there. It breaks the pipeline that produces the people qualified for the top.

The obvious counterargument: maybe working alongside AI compilers is itself a training pipeline, just compressed. A junior who spends two years specifying intent to an agent, catching its failures, learning to distinguish Spec Gap from Projection Gap in real time, might develop judgment faster than one who spent five years writing code by hand. Maybe the apprenticeship model adapts. I don’t think this is wrong exactly. But the old pipeline forced you to live inside the translation, to feel in your hands why certain abstractions break under load. Reviewing an agent’s output is a different cognitive operation than producing it yourself. Review trains evaluative judgment. Building trains generative judgment. If the new pipeline only develops the first kind, you end up with people who can tell when something is wrong but can’t articulate what right looks like. We don’t know yet. The experiment is running.

Either way, the risk is cognitive Ghost GDP, borrowing and twisting a term from Citrini Research. Capability that never developed. The surface looks like intelligence. Underneath, the pipeline that produced genuine human judgment is drying up.

There’s also a structural gap in the AI systems themselves. Everyone is building domain-specific compilers. Almost no one is building the system that reconciles their outputs. In compiler engineering, that’s called a linker. This is the Projection Gap operating between AI systems rather than between human and AI. Each compiler drops different dimensions, and the missing dimensions are exactly the ones needed for cross-domain consistency. The backend assumes a data model the frontend never specified. The roadmap promises a timeline the architecture can’t support. Each output is locally valid. Together they disagree. Linking is constraint propagation across separately compiled artifacts. It may be the most consequential missing piece in AI infrastructure.

If you can see where you sit on this gradient, if you can tell which of your skills are compilation steps waiting to be automated and which are the irreducible specification work above them, you have a map that most people don’t. The framework won’t tell you whether your layer is safe. But it will tell you which direction to move.

What I Think

The recursive novel I built three years ago turned out to be a working model of the compilation stack. The same failure surfaces recurring at every layer. Nine times. I didn’t know what I was looking at until I had the vocabulary.

The question is whether there’s a ceiling. The articulable Spec Gap shrinks every time you’re forced to externalize your intent, and AI is very good at forcing that. But the inarticulable kind, the felt sense, the instinct built from lived failure, the knowledge that was never propositional in the first place. If that kind of intent is real and persistent, then there’s a layer where the Spec Gap is permanent. Not because the technology isn’t there yet, but because the source and the target are structurally incommensurable. Or the diagnostic just keeps recurring. Each ceiling becoming a floor.

I think the inarticulable kind is real. I think there is intent that resists externalization as a structural feature of what it means to have lived a particular life in a particular body. But I think it’s narrower than most people hope. Most of what we call irreducibly human judgment is, I suspect, the articulable kind we simply haven’t been forced to specify yet.

The ceiling may be real. But it’s lower than it looks from here.

Link to slide

Concepts Compilation thesis · Spec gap · Projection gap · The linker · Ghost GDP · Evaluative vs generative judgment