Skip to main content
Published 2026-05-04

6 min read

Why architecture-first delivery controls AI behavior

Part 3 of 3. The model itself is rarely where the real risk sits. The risk usually comes from how loosely the surrounding codebase is put together. When the structure is already written into templates and generators, what an AI can produce gets narrowed long before anyone opens a pull request.

TL;DR

  • The riskiest setup we see is open-ended prompting against a codebase with no enforced structure. The safest is a structured codebase where the AI is given a tightly scoped job.
  • Generators and scaffolds carry observability, compliance boundaries, and naming conventions inside every file they create, so new work inherits the structure of the system automatically.
  • At Corsair, internal code generation handles the bulk of the work. Vendor models come in only for focused, easily reviewed changes where the trade-off between exposing the codebase and the payoff is clearly worth taking.

Share this article

Corsair Media Group

Corsair Media Group

Most of the risk is in the surrounding system

Copied

Most conversations about AI risk in software focus on the review stage. Who reads the pull request, how large the change is, and whether the engineer who clicks merge can explain what is heading into production. All of that matters. The part that gets far less attention is what the model is writing against in the first place.

Drop an AI into a codebase that has no enforced structure, and the model effectively becomes the architect. It picks patterns. It introduces abstractions. It wires up integrations based on whatever its training data pointed it toward and whatever the prompt happened to hint at. Each of those decisions can look perfectly sensible on its own. Read them together six months later, and you find a system nobody on the team agreed to build.

Architectural drift is harder to spot than a missing null check

Copied

That puts reviewers in an awkward spot. They are now expected to catch both ordinary bugs and slow architectural drift in the same reading, and the two ask for very different attention. A missing null check is something a careful engineer flags in seconds. A subtle shift in how a file talks to the rest of the system is much harder to see, because the file in front of you reads perfectly fine on its own.

The texture of this kind of drift is rarely dramatic. A new route handler comes back with its own ad-hoc validation logic instead of pulling from the shared schema module. A service grows a second HTTP client because the model reached for whatever its training data favored, even though there is already a wrapper in the repository for the same job. Error handling on one endpoint suddenly looks nothing like error handling on the endpoint next to it. None of these individually feels like an emergency. Each one quietly adds a second way to do something the team had already settled. You can usually spot the files the model started fresh on, because those are the ones written with the most confidence and the least context.

Drift compounds. By the time anyone catches it in production, the new pattern is already merged, the tests are passing against it, and rolling the system back costs far more than catching the issue when the code was written. Cleanup like that has a way of expanding to fill whatever quarter you give it.

What architecture-first delivery actually constrains

Copied

The remedy is not complicated. Write the architecture down before any AI is pointed at the work. Templates, scaffolding, and code generators carry the rules inside the files they produce, so observability hooks, compliance boundaries, naming conventions, and integration points are sitting in the output before a single prompt has been written. An AI dropped into that environment has far fewer decisions left to make. It is filling in pre-shaped sections. The overall design has already been decided.

Review still matters. Generators do not replace someone reading the code, and they should not be sold that way. What they change is what the reviewer has to read for. Rather than asking whether a diff is quietly inventing a new way of doing something across the codebase, the question becomes whether the local logic is correct, which is something a careful engineer can answer in the time review usually gets.

How Corsair works: generators first, models for narrow tasks

Copied

Corsair invests heavily in internal code generation. Our templates, scaffolds, and generators handle most of the structural work, so the architecture is baked into each new file rather than relitigated every time someone opens a chat window. Once a generator already encodes a pattern, a general-purpose model rarely adds much on top of what the generator produces on its own.

That does not mean vendor models never earn their keep. Two situations where they reliably do are early prototypes and large refactors, especially when the resulting diff is small enough to read carefully line by line. Our usual sequence is to scaffold the structure internally, write the patch by hand, and only then, if it makes sense, route two to ten lines per file through a vendor model. By that point the question is no longer technical. It is economic. The engineer pushing the change is the one making that call. They know what the codebase is exposing. They know what the patch is gaining. Some patches do not justify even a small exposure. Most do, on the kinds of changes we use vendor models for.

One place where models really do pull their weight is end-to-end data restructuring. Renaming a field, threading a new structure through the types and serializers, updating the obvious call sites, and then leaning on the compiler and the test suite to surface whatever was missed. The work is repetitive, the right answer is mostly mechanical, and the tooling fails loudly if a single call site got skipped. That makes it a good match for a tool that produces a lot of nearly-right output in a hurry, because the build will tell you exactly where "nearly" fell short. Even so, we treat it as an aggressive refactor. Someone reads it. The tests run. A human signs off.

Is exposing the codebase to a vendor model worth polishing the last handful of lines when local generators already handled the boilerplate?

In practice, when we do reach for a vendor model, we are usually layering two handwritten lines on top of code that a generator already produced. Our default starting point is the internal generator, and open-ended prompting sits much further down the toolchain. Fewer one-off integration points end up in the repository over time. Observability instrumentation like OpenTelemetry, our compliance boundaries, and our internal naming conventions are all in place before any meaningful amount of context leaves the building.

That last point is the one we hold the firmest line on. A vendor model only ever sees a focused change that a human can fully evaluate. Nothing larger.

Do you have to use AI?

Copied

Do you have to use AI? Well, yes and no. Deterministic generators already handle most of the routine delivery work a healthy engineering team takes on. Therefore, a reasonable target is to get your team to a place where it can deliver without depending on any third-party AI tooling, then bring models in only where there is real, observable evidence that they help augment your existing generators.

That sequence is what keeps the failure modes covered in Part 2 contained, and what keeps the workflow patterns covered in Part 1 inside boundaries the team can defend in practice.

With rules in place, AI speeds up delivery; without them, it makes the codebase less consistent

Copied

The worst setup we have run into is open-ended prompting against a codebase with no enforced structure. Or if there is a structure, the engineer simply doesn't recognize the AI output as bad (which is why you need senior review!). The best is the opposite of that: controlled output and proper senior oversight. People spend a lot of energy arguing about which model is currently best. That question matters far less than what the codebase looked like the day the AI was first asked to write into it.

Inside a system with clearly written rules, AI can speed delivery up. Without those rules, generated work tends to pull the codebase in several directions at once - the code ends up having as many voices as there are engineers, and getting it back into one piece is not something you want to deal with.

If any of this looks familiar in your own work, then what would the right first step look like for your team, and would you be open to starting that conversation through our contact page?

If you want delivery that puts the architecture and generators in front of open-ended prompting, then talk with Corsair about your next build.

Contact Corsair