The Spec Is the Design

If the engineer’s job is moving up the abstraction stack, then the spec is where they land.

Apr 09, 2026

Usual note on how this was written. Claude Opus as writing partner, me directing the argument and reviewing every concept. See Post 1 for why I think this transparency matters.

In the first three posts of this series, I’ve argued that the engineer’s role is shifting from how to what — from writing implementation syntax to owning design intent. I’ve shown that the model you choose matters enormously, and that even frontier models can be dangerously permissive when it comes to hardware safety rules.

All of that raises a question: if the engineer’s job is increasingly about intent rather than implementation, where does that intent actually live?

The answer, I believe, is the spec. And I mean something very specific by that — not the 200-page PDF that sits in a SharePoint folder and nobody reads after kickoff.

How specs actually work (and don’t)

Let’s be honest about what happens in most chip design organizations.

The project starts with a spec — an ERS, a microarchitecture document, sometimes a full chip-level specification. The team writes it carefully. It describes the major blocks, the interfaces, the register map, the performance targets. Often there’s a numbered feature list: FR-001 through FR-247, each describing a capability the chip must have. The document gets reviewed, signed off, maybe even blessed by the system architect.

Then reality sets in.

The design starts before the spec is complete — it has to, because the schedule demands it. Engineers begin writing RTL for the blocks they understand well enough, while the spec for the trickier parts is still being debated. As the RTL takes shape, edge cases surface that the spec didn’t anticipate. The engineer makes a judgment call, writes the code, and moves on. Maybe they update the spec. More often they don’t — they’re under pressure to hit a milestone, and updating the document feels like overhead.

Week by week, the gap widens. The RTL evolves, the spec doesn’t. By mid-project, the spec describes a chip that no longer exists. By tape-out, the RTL is the spec — the document is an artifact of the past, and everyone knows it.

Now consider what happens on the verification side.

The DV team starts from the spec too — they build testbenches, write checkers, define coverage models based on the numbered feature list. But as simulation runs uncover mismatches, a subtle drift begins. When the RTL doesn’t match the testbench, the team has to decide: is the RTL wrong, or is the testbench wrong? In theory, you go back to the spec and check. In practice — especially when the spec is stale — the team uses their engineering judgment to determine which side is correct and fixes the other. The testbench gets adjusted to match the RTL, or the RTL gets adjusted to match the testbench. The engineers are making the right call based on their understanding of the design. But the spec is no longer the arbiter — and nobody updates it with the decision.

The result is a design with 100% code coverage, 100% toggle coverage, comprehensive functional coverage. The verification team has demonstrated, exhaustively, that they have a perfectly working coffee machine.

Except the spec called for a dishwasher.

This isn’t a failure of engineering talent. It’s a failure of methodology. The spec was supposed to be the source of truth, but it became a historical document the moment the first engineer made an undocumented judgment call. Everything downstream — RTL, verification, constraints, signoff — lost its anchor.

This has been tolerable for decades because the engineer who wrote the spec also wrote the RTL. The intent lived in their head, even when it didn’t live in the document. The spec was communication — a way to tell other people what you were building. If it was ambiguous, the same brain disambiguated it in real time.

AI breaks this model.

Why ambiguity becomes bugs

When an AI generates RTL from a spec, the ambiguity that a human engineer would resolve through experience and domain knowledge becomes a design decision made by the model. And the model will make a decision — it won’t stop and ask. It will pick the interpretation that seems most likely based on its training data, produce syntactically correct code, and move on.

Sometimes that interpretation will be right. Sometimes it won’t. And the failure mode is the worst kind: the output looks correct. It compiles, it simulates, it passes the tests you thought to write. The ambiguity in the spec became a silent assumption in the RTL, and unless someone catches it in review, it ships.

This is not a theoretical concern. Here’s one I’ve seen play out: a spec says “the register shall support both software write and hardware write.” It doesn’t say who wins when both write on the same cycle. The designer — human or AI — makes a choice. Maybe hardware wins, because that’s what their experience suggests. The RTL is clean, the register works, the block passes verification.

Six months later, the firmware team discovers that their writes to this register occasionally don’t stick. They file a bug. The verification team can’t reproduce it in their testbench, because their tests never happened to hit the simultaneous-write corner case.

And this isn’t a failure of verification effort. Simultaneous write scenarios of this kind are notoriously difficult to hit in simulation — even with constrained-random. Without an explicit constraint targeting that exact cycle-level overlap between the SW write and the HW write, a random stimulus generator may never produce it within any practical simulation budget. But nobody wrote that constraint, because nobody knew there was a decision to test. The spec said “shall support both” — which a coverage model dutifully marks as satisfied the first time either path fires. The corner case was invisible to verification not because the testbench was weak, but because the spec never defined it as a corner case at all.

The firmware team spends days chasing what looks like a timing issue. Eventually someone reads the RTL line by line and discovers: hardware has precedence. The firmware needs to read-back after write to confirm. Nobody knew this, because the decision was made by one engineer (or one AI) and never written down. The spec said “shall support both.” It didn’t say what happens when they collide.

In a human-only flow, the designer who chose hardware-wins would probably remember the decision and mention it in a review. In an AI-assisted flow, the model made the choice silently. There’s no memory of the reasoning, no hallway conversation where the firmware lead overhears “oh by the way, HW wins on that register.” The review burden shifts: you’re not checking whether the code matches your intuition, you’re checking whether the code matches intent that was never formally expressed.

And here’s where the old methodology and the new one converge: the spec drift problem existed before AI. AI just makes it lethal. When a human engineer made undocumented judgment calls, at least the intent stayed in someone’s head. When an AI makes those calls, the intent is gone. There’s no brain to query. There’s just code that looks right.

The fix, at least when AI is generating the RTL, is to make silent choices structurally impossible. A well-designed AI flow doesn’t just translate spec to code — it monitors for underspecification. When the model encounters a behavioral case the spec doesn’t address, it shouldn’t pick an interpretation. It should stop, flag the gap against the specific spec section, and refuse to proceed until a human reviews the ambiguity, makes a deliberate call, and updates the spec. The choice gets made — but it gets made at the spec level, explicitly, with a human in the loop.

The harder case is AI-assisted review of human-generated RTL. A human engineer writing code makes dozens of undocumented judgment calls — small decisions scattered across thousands of lines, each individually reasonable, collectively forming a shadow spec that exists only in the code. Getting an AI to read that RTL and reconstruct every implicit decision — then trace each one back to what the spec says (or doesn’t say) — is a far more demanding task than flagging gaps during generation. There’s no clean exception to raise. The AI has to infer intent from implementation, identify the decisions that were made, and surface the ones the spec never authorized. That three-way relationship — between the spec, the implementation, and the artifacts that verify them against each other — is what the next post sets out to formalize.

The spec must become the source of truth

The coffee machine problem and the AI ambiguity problem have the same root cause: the spec stopped being the thing you measured against.

In an AI-assisted design flow, the spec can’t be a communication document that drifts. It has to be the source of truth — the artifact that everything else is generated from, verified against, and traced back to. When the RTL doesn’t match the testbench, the answer is never a blind fix to either side. The testbench may have been written by someone who misread the spec; the RTL may have drifted from it. Both artifacts get measured against the spec. The answer is either “fix the RTL,” “fix the testbench,” or “update the spec” — and the last one is a deliberate, reviewed decision, not an expedient hack at midnight before milestone.

That means the spec needs to be:

Precise enough to generate from. Every behavioral requirement needs to be unambiguous. Not “the register shall support SW write and HW write” but “the register supports concurrent SW and HW write; on collision, HW write takes precedence and the SW write is silently dropped; SW must read-back to confirm.” That level of precision feels pedantic when a human is reading it. When an AI is generating from it, it’s the difference between correct and plausible. And when the firmware team reads it, they know exactly what to expect.

Structured enough to verify against. If the spec says the response latency is at most 5 cycles, that’s a requirement that can become an SVA assertion. If the spec says “the block should be fast,” nobody — human or AI — can write a meaningful check against that. Every requirement in the spec must be translatable into something verifiable: an assertion, a coverage point, a formal property, a test case.

This points to something important about what the spec needs to be. In the methodology we’re building toward, the spec is not a prose document that happens to contain requirements — it’s a structured artifact: YAML, a requirements database, or another machine-readable format in which every feature entry is defined precisely enough to be testable. Not “testable in principle,” but concretely mapped: each requirement either generates a formal property, an SVA assertion, a simulation test case, or at minimum a defined check. The hierarchy matters — formal beats simulation beats manual — but anything that produces a verifiable artifact is a valid expression of a requirement. We’ll expand on what that structure looks like in future posts. The principle belongs here: if a requirement can’t be expressed in measurable terms, it isn’t a requirement — it’s a wish.

Maintained as a living artifact. The spec isn’t done when coding starts. It evolves alongside the implementation, and any change to the RTL that contradicts the spec is either a spec update (the intent changed) or a bug (the implementation diverged). This requires discipline — and here’s the hypothesis at the center of this whole approach: AI is what finally makes that discipline enforceable. The reason it has historically broken down is human fatigue. Re-reading a 200-page spec on the hundredth RTL change, cross-referencing every new line of code against every relevant section — nobody does that consistently under schedule pressure. AI does. It doesn’t tire of reading the same document, doesn’t skip the cross-check because it’s Friday afternoon, doesn’t decide a section is “probably fine.” The discipline that was always theoretically correct but practically unsustainable may finally be achievable — because the entity enforcing it doesn’t get tired.

What this looks like in practice

I’ve been developing a methodology around this, and while the details could fill a book, the core idea is a pipeline:

Phase 1 — Structural decomposition. Load the spec into the AI and extract the design hierarchy: blocks, interfaces, dependencies, parameters. The output is a structured representation that becomes the map for everything that follows — the exact format is still an open question I’m working through, and it matters more than it might seem. What’s already clear is that this phase, as currently implemented, is not a single AI call. It requires a team of agents, each with their own specialization, reviewing and cross-checking each other’s outputs before the result is trusted.

Phase 2 — Coherence checking. With the full spec in context, have the AI cross-reference every section against every other section. Interface mismatches, undefined behaviors, contradictions, missing specifications. A 200-page spec written by three different engineers over six months will have contradictions. Finding them before RTL generation starts saves weeks.

Phase 3 — Requirements extraction. For each block, distill the spec into numbered, verifiable requirements. Each requirement traces back to a specific spec section. This is the layer that turns prose into contracts. At this point, the original prose spec becomes a human-readable reference, not the source of truth. The structured feature list is. The prose should be fully reconstructable from it — if it isn’t, the extraction wasn’t complete.

Phase 4 — Design collateral generation. This is where hardware diverges sharply from software, and it’s worth being explicit about it. In software, the source code is the artifact — everything else is derived from it. In hardware, the RTL is fundamental but, on its own, utterly useless. Without SDC (timing constraints), CDC verification intent, UPF (power intent), scan and DFT constraints, and a growing list of other collaterals, the RTL cannot be implemented, verified, or manufactured. Each of these must be generated from the same requirements, and each must be completely aligned with the RTL and with each other. A timing constraint that doesn’t reflect the RTL’s clock structure is a silent bug. A power domain definition that doesn’t match the RTL’s isolation logic is a functional failure waiting for silicon. The spec-centric approach applies equally to all of them — RTL generation is one output of this phase, not the only one.

Phase 5 — Verification. Generate assertions, coverage points, and test plans directly from the requirements. Every requirement has at least one verification artifact. The traceability matrix connects spec → requirement → collaterals → test — where collaterals means all of them: RTL, SDC, UPF, CDC intent, DFT constraints. Verification is not complete until every requirement is covered across every artifact it touches.

The critical insight is that this pipeline doesn’t work without a spec that’s precise enough to drive it. Garbage in, garbage out applies with ruthless efficiency when the “garbage” is an ambiguous spec and the “out” is AI-generated silicon.

The organizational challenge

I’ll be direct about this: the hardest part of spec-centric design isn’t technical. It’s cultural.

Engineers want to write code. That’s what they were hired to do, that’s what they’re good at, and that’s what feels productive. Writing a precise spec — one that’s unambiguous enough for an AI to generate from — feels slow. It feels like bureaucracy. The temptation to skip the spec and go straight to RTL is enormous, especially under schedule pressure.

But here’s the reframe: writing the spec is the engineering work now. The precision that used to go into hand-crafting RTL now goes into hand-crafting the spec. The intellectual challenge hasn’t decreased — it’s moved upward. Defining exactly what a block should do at every boundary, under every condition, is harder than coding it. The code is the easy part. The thinking is the hard part. And the thinking lives in the spec.

There is a harder truth underneath this that the industry is not yet ready to say plainly: engineers should no longer be writing RTL. The writing should be done by AI. The engineer’s role is to instruct, review, and approve — not to type the code.

We should be precise about what that means in practice, because it’s easy to misread. It does not mean throwing a vague sentence at a model and expecting a correct memory controller to emerge. In the near term, generating anything beyond simple blocks will require intensive back-and-forth: clarifying intent, correcting misinterpretations, tightening the microarch spec until the model has enough precision to proceed. There will be cases where writing the RTL directly is still faster than describing the behavior in sufficient detail for the AI to get it right. That is where the technology is today, and it’s a legitimate engineering judgment call.

But the direction is not ambiguous. Writing RTL — the physical act of authoring Verilog or SystemVerilog syntax — is a skill that will matter less with each generation of tooling. What will matter is the ability to specify with precision, to evaluate generated output, and to recognize what the model got wrong. For now, understanding the code well enough to review it remains essential. The threshold for what can be trusted without close inspection will keep rising. The goal is to move that threshold — not to defend the one we’re standing at today.

“I’ll never ship code I don’t understand” — it sounds like a principle. It’s actually a myth engineers tell themselves. In any real project, you have coworkers. You review their PRs at the architectural level, you read the commit message, you spot-check the tricky parts. You do not read every single line of every module that ends up in the chip. You never did. What you actually do is calibrate trust: you decide how much autonomy to extend to a given person, on a given block, given what you know about them. A senior engineer you’ve worked with for five years gets a different level of scrutiny than a contractor you hired last month.
AI has the same calibration problem. The question isn’t “do I understand every line it wrote?” — the question is “have I worked with this tool long enough, on this class of problem, to know where it’s reliable and where it needs supervision?” That’s how you treat a skilled colleague. Not with blind trust, and not with the paranoid assumption that every output needs to be verified from first principles. You learn the failure modes, you build intuition about where to look closely, and you extend autonomy accordingly.

Hardware has one structural advantage here that software doesn’t: a deep bench of deterministic automated checkers. Lint catches style and coding rule violations before simulation. CDC and RDC tools find structural clock and reset domain crossings that no human reviewer reliably spots at scale. SDC checkers verify that timing constraints are consistent with the design. Formal verification and simulation catch functional divergence. Power extraction validates that UPF intent matches RTL behavior. These tools don’t care whether the code was written by a human or an AI — they apply the same rules either way. That changes the trust calibration: extending autonomy to an AI-generated collateral isn’t a leap of faith when a CDC tool is going to run over it regardless. The toolchain is part of the review.

What comes next

This post has been deliberately general — a framing of why the spec matters and what it needs to become. In the next post, I’ll get much more specific. Design by Contract is a concept from software engineering that maps surprisingly well onto hardware interfaces: clock domains, resets, protocols. I’ll show what it looks like to define formal contracts at block boundaries — assume/guarantee pairs that are precise enough to drive assertion generation and compositional verification.

The spec is the design. The contract is how you make it enforceable.

Marco Brambilla is a semiconductor industry veteran with 25 years in chip design, most recently as Senior Technical Director at Meta Reality Labs. He writes about AI, chip design, and the future of hardware engineering at Above the RTL.

Above the RTL

Discussion about this post

Ready for more?