By Logic We Prove; By Intuition We Discover
Why today’s LLMs are great proof engines, and why the next 2-3 years belong to “silent” discovery architectures
Abstract
The most interesting thinking I do rarely arrives as a neat chain of sentences. It shows up as a shape, pre-verbal, incubating in the background, and then snaps into a sentence only at the end. That distinction matters because most modern LLM progress has been built around the opposite interface of language first, one token at a time, with “reasoning” performed as text. It’s powerful, audit-friendly, and wildly productive but it also taxes discovery. This essay argues that current LLMs behave like proof engines with serial, verbalizable, consistency-pressured, and excellent at finishing a known path. Human cognition, especially insight “shower thoughts” and hypothesis generation, leans on a different regime of a parallel, latent workspace that can hold incompatible fragments until they cohere. The next 24-36 months of real capability gains won’t come from longer chains-of-thought. Instead they’ll come from architectures that separate discovery from proof with models that can think quietly in latent space, run internal simulation, and only then project into language. I’ll sketch what cognitive science suggests is happening in us, how today’s LLMs mirror (and miss) it, and the concrete system patterns that will define the next wave.
Link to slides: https://gxnzdfgg.gensparkspace.com/
1. The morning thought that doesn’t speak in sentences
I woke up with something already “there.”
Not a conclusion I could justify, not a syllogism I could replay. More like an arrangement with a few ideas that had apparently been negotiating while I slept, plus an emotional tag of importance and a kind of internal pointer saying go look here. Only after a few minutes did it become language.
That’s the kind of moment Poincaré captured better than any ML blog has.
It is by logic that we prove; by intuition that we discover.
We treat “intuition” as a hand-wavy label. But the phenomenology is specific. Discovery often begins pre-verbal. It’s not “System 1” impulsivity (fast pattern matching that blurts). It’s not “System 2” narration either (slow, explicit steps). It’s closer to a third mode of incubative cognition, the background integration that can combine far-apart representations without marching through a sentence-level trace.
Now the uncomfortable part is that today’s LLMs are optimized for the opposite mode. They are phenomenal at turning a prompt into a coherent linear artifact. They can write proofs, generate step-by-step reasoning, and produce explanations that look like thought. But that doesn’t automatically mean they’re good discoverers.
Here’s the thesis I’m betting my next few years on.
**Current LLMs are mostly proof engines.**They’re serial, verbalizable, and constrained by the “must produce the next token” interface. They can emulate discovery, but they pay a heavy tax for it. By discovery, I mean hypothesis generation under uncertainty with proposing new latent structure, not just extending a given reasoning trace.”
**The next 2-3 years will be about architectures that separate the two.**A “quiet” discovery substrate that can explore, simulate, and integrate—followed by a proof/explanation layer that projects the result into language.
2. The real bottleneck isn’t intelligence. It’s the interface
Both humans and LLMs communicate through a linear stream of language with one word after another. That stream is a narrow channel. When you speak, you commit to an ordering that your mind didn’t necessarily have. When a transformer decodes, it commits to a token that becomes the context for everything that follows.
Under the stream lives something else, a latent workspace, a high-dimensional, distributed state where partial ideas can coexist. In cognitive science, a long line of models treat cognition as parallel constraint satisfaction with many interacting factors “settle” into coherence, rather than being enumerated one by one. In neuroscience, Global Workspace Theory is one influential way to reconcile this. Lots of parallel unconscious processing competes, and then one coalition “wins” and becomes what you can report as your single conscious content.
This framing maps uncannily well onto modern transformer behavior. (This is an analogy at the level of bottlenecks and coordination, not a claim of biological equivalence)
- Parallel latent computation happens inside the forward pass (hidden states, attention heads, residual stream).
- Serial broadcast happens at decoding (one token is selected and appended to context).
The critical loss happens in between. When a rich latent state must be expressed as a single sequence, information gets dropped or distorted. I call this the projection gap, the mismatch between what is represented in a high-dimensional latent substrate and what survives when compressed into a linear narrative.
You can feel the projection gap in yourself.
- You “know” something but can’t say it yet.
- You sense coherence without being able to list reasons.
- You can picture a solution but struggle to explain it in order.
And you can see it in LLMs.
- The model often “knows” the answer early (in latent state) but still produces a long explanation.
- Forced chain-of-thought can lead to plausible narration that isn’t causally faithful.
- Early mistakes cascade because the model must keep going, token by token.
**Language is not thought. It’s a bottlenecked projection of thought.**When we confuse the projection for the process, we mistake proof for discovery.
3. What cognitive science actually says about intuition and “shower thoughts”
People love to force every mental phenomenon into “System 1 vs System 2.” It’s a useful teaching tool, but it’s not a sufficient model of insight.
Here’s a more grounded picture that cognitive science and neuroscience broadly support.
3.1 Insight is often incubation + threshold crossing
In “insight” problem solving, people frequently report.
- being stuck,
- stepping away (or mind-wandering), then
- a sudden “Aha!” moment.
This is consistent with a two-phase dynamic.
- Parallel background processing: unconsciously exploring associations, relaxing constraints, reweighting representations.
- Global ignition / conscious access: once a coherent solution becomes strong enough, it enters awareness and becomes reportable.
That “Aha” isn’t magic. It’s what it feels like when a distributed system converges and a single interpretation wins.
3.2 Mind-wandering isn’t the absence of thought. It’s a different allocation regime
The brain has networks associated with deliberate control and networks associated with internally generated thought. When you’re showering, walking, or half-asleep, top-down task control loosens. That’s not “dumber mode.” It changes the search geometry to fewer constraints from the current verbal narrative, more opportunity for remote associations.
This matters because discovery often requires escaping your current local optimum. In optimization terms, incubation is sometimes a temperature increase, a controlled way to avoid the basin you’re trapped in.
3.3 Pre-verbal thought is real, common, and under-theorized
There’s strong evidence (from behavioral work, patient studies, and introspective sampling methods) that not all thinking is in words. People report imagery, spatial structures, affective tags, and “unsymbolized” thoughts. You can have a clear intention or idea without a sentence.
This is why “intuition” fits poorly into the standard ‘System 1 vs. System 2’ dichotomy. In current ML discourse (like the latest reasoning models), we often conflate System 2 with explicit verbal serialization, the act of printing tokens step-by-step. But deep cognitive work isn’t always a verbal chain.
Real discovery operates in a third mode: it is slow like System 2, requiring time and compute, but silent like System 1, operating on high-dimensional associations rather than symbolic logic. If we want machines to discover, we must stop demanding that ‘reasoning’ always looks like a transcript
- System-like narration is serial, symbolic, reportable.
- Incubative insight is often parallel, sub-symbolic, not-yet-reportable.
If we want machines to discover, we should stop demanding that discovery be born as a sentence. Humans don’t do it that way. We let a latent workspace do the messy integration, and we verbalize after.
4. Why chain-of-thought is both a breakthrough and a trap
Chain-of-thought (CoT) prompting was a genuine unlock, asking models to “think step by step” improves multi-step performance and makes errors legible. But it also nudged the whole field toward a subtle confusion.
We started optimizing for reasoning-as-text.
That has three costs that compound over time.
4.1 The serialization tax
If a model must express intermediate cognition as tokens, you pay compute and context for narration that may not carry real information. You’re forcing the system to “buy thinking time” by emitting words.
Humans do a version of this too (inner speech), but we also think in nonverbal formats. LLMs, by default, don’t have that option at inference time. The decoding interface is the boss.
4.2 Narrative lock-in and commitment cascade
Autoregressive decoding is a commitment machine: once a token is produced, it becomes the prompt. Early assumptions anchor everything that follows. That creates “narrative inertia,” where the model keeps going even if a later constraint should force revision.
Humans can say “Wait, scratch that.”
A vanilla decoder can’t, unless you wrap it in an outer loop.
4.3 Reportability bias
When you ask a system to justify itself, you bias it toward what it can justify well. That can distort the search. Humans rationalize. Split-brain confabulation is the dramatic case, but everyday explanations are often post-hoc stories that preserve coherence more than truth. LLMs do the same thing except they can write the story fluently.
So CoT is not “the model showing you its mind.” Often it’s the model producing a plausible narrative that correlates with correctness but is not guaranteed to be faithful. This is the projection gap in action.
If you care about discovery, the trap is obvious. We can’t force a discovery process to be born as an explanation without changing the process. Sometimes that helps (forcing structure). Sometimes it blocks away the very mode we need.
5. The next wave: “Think quietly, then speak” architectures
We should expect a shift from language-only cognition to latent-first cognition with language as the interface layer.
Here are the patterns that will define the next 24-36 months.
5.1 Latent scratchpads, hidden thoughts, and “silent” reasoning
The most direct fix for the serialization tax is simple. Give the model space to compute without emitting public tokens.
Instead of forcing a chain-of-thought to be written in English, you let the model run internal steps in a latent representation, then project to an answer. This is the architectural version of what you do when you “sit with a problem” without talking.
This direction is already visible in research efforts that introduce hidden rationale tokens, latent chains, pause/think steps, or internal deliberation loops. They are mechanisms that decouple compute from visible text. The model should be allowed to spend compute to reduce uncertainty without increasing output length.
5.2 Separate discovery engines from proof engines
The cleanest mental model is a two-stage cognitive stack.
- Discovery engine (latent): generates candidate hypotheses, plans, representations, or explanations; explores alternatives; runs internal simulation; tolerates ambiguity.
- Proof engine (verbal): picks, verifies, and communicates; produces structured reasoning; checks consistency; cites sources; writes code.
This is different from “multi-agent” approach. It’s functional separation, like a compiler pipeline:
- front-end: generate abstract syntax trees
- middle-end: optimize
- back-end: emit code
You can implement a version of this today with system design, even if the base model is still an autoregressive decoder.
5.3 Outer-loop cognition: sampling, critics, verifiers, and tools
Until latent-first architectures are mainstream, we can approximate a cognitive field by wrapping the model.
- Multiple drafts (sampling): explore several solution trajectories.
- Critic/verifier loops: attack your own answer before shipping it.
- Tool use: offload arithmetic, retrieval, simulation, and unit tests.
This is the engineering analogue of incubation and conscious checking.
- background variation generation
- then deliberate verification
In practice, this pattern already beats single-shot CoT for many tasks because it reduces commitment cascade.
5.4 Memory that isn’t just more context
Long context windows help, but they’re not the same thing as usable long-term memory. Without compression and retrieval, “more tokens” becomes a haystack.
Expect a rise in architectures that treat memory as the critical module including episodic summaries, semantic stores, retrieval policies, and learned compression.
The discovery engine needs structured recall to make remote connections. The proof engine needs traceable recall to justify them.
5.5 Plasticity: the missing link that will keep resurfacing
One deep divergence between biological cognition and most deployed LLMs is plasticity. Brains change with experience continuously. Most models don’t, except through fine-tuning. The most important “intuition” improvement won’t come from a bigger model. Rather, it’ll be controlled, safe mechanisms for fast adaptation.
Not uncontrolled weight updates in production, but systems that can form short-lived associations, update working hypotheses, and learn from local context in a way that doesn’t catastrophically overwrite core competencies. This is the difference between a mind that only processes and a mind that updates itself.
Discovery is not only generating ideas; it’s reshaping the hypothesis space.
6. Practical implications: how to build (and recognize) real discovery systems
If you’re building with LLMs now, here’s the takeaway.
6.1 Don’t depend only on longer chains. Optimize for better search geometry
A long chain-of-thought is not evidence of depth. Often it’s evidence of a model buying time in the only measure it has, tokens.
Instead,
- Use diversity (multiple drafts).
- Use structured self-attack (critique).
- Use hard tools (execution, retrieval, tests).
- Use late commitment (don’t lock assumptions early).
6.2 A concrete “discovery to proof” loop you can ship
If I were designing a reasoning agent for production today, I wouldn’t just prompt it to ‘think step by step.’ I would architect the system to build the silence before the speech.
The goal is a loop that forces the model to resolve uncertainty in the latent space before it commits to the context window. The system should generate high-entropy candidate vectors, run them against verifiers (tools, unit tests), and compress the winning trajectory into a structured state and not a narrative. Only once the solution is found and verified, the Proof Engine wakes up to translate that state into a human-readable explanation.
6.3 The inevitable tradeoff: auditability vs cognition
As we move toward latent-first reasoning, we’ll lose some transparency. Hidden scratchpads improve performance but reduce inspectability.
So the real frontier isn’t only making models think quietly.
It’s making them quietly think in ways we can still trust.
That means,
- verifiable outputs,
- tool-based checks,
- mechanistic interpretability,
- and governance patterns that treat “unseen cognition” as a critical risk.
Conclusion
I don’t think the next leap comes from asking LLMs to “think step by step” harder. I think it comes from admitting something basic about minds, both biological and artificial. Speech is not the place where thought happens. Speech is where thought becomes legible.
Today’s LLMs are stunning at legibility. They can produce proofs on demand. But discovery (the thing that feels like waking up with an idea you didn’t explicitly compute) needs a different architecture: a latent workspace that can incubate, explore, and converge before the first token.
The next 2-3 years will move beyond scale. It’ll be about building the missing separation layer.
- a discovery engine that can be messy,
- a proof engine that can be clean,
- and a bridge between them that minimizes the projection gap.
That’s the road from “talking machines” to systems that can actually surprise us in the way good ideas do, quietly first, and only then in words.
Link to slides: https://gxnzdfgg.gensparkspace.com/
Concepts Projection gap · Four engine model of discovery · Evaluative vs generative judgment