Compilation thesis

Compilation is the structural operation that turns higher-level specification into lower-level artifact. The translation is directional, lossy, and not reversible. Where the domain supplies an independent verifier that can check whether the artifact answers the specification, compilation can be scaled by a training loop that compounds with compute. Where no such verifier exists, compilation still happens through human work but does not compound, and AI capability gains stall regardless of model scale.

The mechanism

Compilation is a specific operation. A higher-level specification, whether intent or claim or design, is translated into a lower-level artifact that runs in the world. The translation is directional and lossy. It happens in code, in legal drafting, in product strategy, anywhere a person turns a less-specified thing into a more-specified one. Spec-gap and projection-gap name what gets dropped in the translation. The compilation thesis is the further claim about which compilations compound under compute scaling and which do not.

The mechanism that distinguishes the two is the verifier. A verifier is an independent check that decides whether the artifact answers the specification it was given. The verifier does not write the specification; it does not produce the artifact; it only decides whether the artifact ran. Where the verifier exists and runs cheaply relative to the work of producing a candidate, the cost of generating training signal collapses. Candidates are generated, the verifier filters, surviving pairs accumulate as training data. The dataset grows with the compute that produces candidates, not with the labor of humans labeling them. The compounding curve follows compute because the verifier in the inner training loop is mechanical rather than human. Code has pytest and type systems. Math has Lean. Both produce verified training data faster than humans can label, and the gap between compute-scaling and human-effort-scaling becomes structurally unbridgeable over any meaningful horizon.

Where the verifier collapses into the work of producing the candidate, the asymmetry vanishes. The mechanism is structural to the domain, not a property of any model. The term carries two levels: compilation names the general translation operation; the compilation thesis names the narrower prediction about which translations compound under compute. The rest of this page works at the narrower level.

What the asymmetry requires

Three properties have to hold together for the verifier to do its work.

Input completeness. The verifier must be checking against a specification that is itself fully present. A function signature plus a test suite is complete: the inputs are fixed, the expected behavior is given, the unknown is the implementation. A client’s legal situation is not complete. Facts arrive during discovery and depositions; the opposing side’s filings change what counts as the case; the specification keeps moving. Without input completeness, the verifier has nothing well-defined to check against.

Consensus verifiability. There must be a unique right answer, or close enough to one that qualified reviewers would agree. A proof either type-checks or it does not. A test either passes or it fails. Legal reasoning is in-principle consensus-verifiable at the answer level even when the verification itself is expensive in practice. Literary judgment is not consensus-verifiable at any level. Different readers have different taste, different traditions hold different standards, and the disagreement is not converging toward a ground truth that infinite reviewer time would discover.

Step-terminating verifiability. The verifier must return a signal fast enough to inform the next step of the loop. Pytest returns in milliseconds; Lean in seconds; the signal arrives while the training process is still running. A trial outcome returns in years, confounded by everything that happened between the decision and the verdict. Without step-terminating signal, you can train on historical data but you cannot close the interactive loop that accelerates compilation. DeepMind’s AlphaProof reached IMO silver-medal level in 2024 by training against the Lean verifier on self-generated proof candidates; the kind of compounding that step-terminating verifiability makes possible.

Code and math have all three. Most domains that compile slowly fail at least one. The pattern predicts which capability gaps close with model scale.

What this is structurally different from

Three nearby framings are sometimes treated as equivalent. Each misses the structural claim.

The capability-gap framing. Models will eventually become smart enough and all domains will compile. This treats the translator as the binding constraint. The compilation thesis treats the verifier as the binding constraint. A perfect translator into a verifier-less domain still cannot close the training loop, because there is no signal that distinguishes good output from confident-sounding output.

The data-gap framing. The slow domains lack training data. The compilation thesis says data is downstream of the verifier. Where the verifier exists, data scales with compute. Where the verifier does not exist, data is capped by what humans can produce by hand, and the cap is structurally low relative to compute-driven curves. The gap is not how much data exists today; it is how the supply scales tomorrow.

The alignment framing. The limit is model behavior, whether the model can be made to do what we want. The compilation thesis says the limit is upstream of behavior. Even a perfectly aligned model in a verifier-less domain produces outputs that cannot be checked against anything except more model output. The classic alignment framing centers translator behavior; compilation is about whether a domain admits the asymmetry that lets translator behavior compound.

The thesis is a claim about domain structure. The translator is replaceable as scale and methods improve; the verifier either exists in the domain or it does not, and the existence is not something model progress fixes.

  • Spec gap. The distance between intent and stated specification. Compilation requires a specification; spec-gap names what gets lost when the specification is produced.
  • Projection gap. The loss when high-dimensional thought projects into serial language. Operates one layer below compilation: even where the verifier exists, the specification fed to it has bandwidth limits.
  • Synthesizability boundary. The line between artifacts a verifier can certify from a specification and artifacts that resist any such certification. Compilation thesis identifies that the line is the binding constraint; synthesizability boundary names the line itself.
  • The linker. Cross-system constraint propagation between separately compiled artifacts. Compiled Away names it as plausibly the most consequential missing piece in current AI infrastructure.
  • Externalized taste. The institutional corpus of critique encodes evaluative judgment in a shape solicited preference data and model-generated rationales cannot supply. Whether and how it can substitute for a programmatic verifier in training loops is the open question compilation thesis raises in verifier-poor domains.

Status

Living. Certainty tentative. The mechanism is the framing being asserted; the underlying observation (verifier-driven domains compound faster than verifier-poor domains) has empirical support across code and math, with the contrasting plateau in legal, clinical, and creative domains as the structural prediction. Importance 8.

Referenced in