|

AI-Generated Code Has a Shelf Life | by Yonatan Sason | Feb, 2026

We build production platforms with AI every day, and we work with teams doing the same with their own stack -Cursor, Claude Code, Copilot. The difference shows up fast. By day two, some codebases are already harder to change than they were yesterday. Others keep getting easier. The difference is never the model. It’s what the code lands in.

The teams we work with that hit a wall? It’s always the same story. Someone generated a module (maybe 200 lines, maybe 600. Doesn’t matter). It worked. Passed tests. Shipped. A few days later someone needs to change it and realizes the only thing that understood this code was the context window that wrote it. Multiply that across a codebase over a few months and you’re looking at a full rewrite.

That module is now a black box. The code isn’t bad. There’s just no structure around it that lets a human or a different AI session touch it safely.

What we mean by “black box”

Not code you can’t read. Code where reading it doesn’t give you enough to change it.
We see the same failure modes over and over:

  • No boundaries. A notification system that handles email, SMS, push, and webhooks in one module. Everything touches everything. You want to swap the email provider? Good luck — it shares state with SMS logic, and that relationship isn’t declared anywhere.
  • Implicit dependencies. The module imports a user service, a template engine, a queue. How do they connect? The only documentation is the runtime behavior. So to understand it, you either run it or read all of it.
  • Missing contracts. What does `sendNotification()` accept? What does it return? What happens on failure? AI-generated code often has clean implementations with no explicit interface. The “contract” is just whatever the current code happens to do.
  • Docs that explain nothing. Generated JSDoc: `@param message the message to send`. Yeah, I can see that. What I need to know is why this function exists, what calls it, and what breaks if I change it.
See Also  Securing Your Email Sending With Python: Authentication and Encryption

Every AI coding tool we’ve worked with produces at least two of these by default. Most produce all four.

Generation is solved. Day 2 isn’t.

The raw ability to produce working code from a description, we’re past that. What’s not solved is what happens when you need to change, extend, or hand off what was generated.

The metric that matters isn’t time to generate. It’s time to understand.

If AI saves you 10 hours writing a notification system but the next developer spends 40 hours understanding it before they can safely add a webhook provider, you didn’t save anything. You shifted the cost from the person who wrote it to the person who has to live with it. And that second person is usually you, a week later, having forgotten everything.

Get Yonatan Sason’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

One black box module is annoying. A whole codebase of them after a few months of AI-first development… that’s a rewrite.

What actually makes code survive

We’ve spent years on this, not academically, but as the core challenge behind Bit. What makes code maintainable? Not readable. Maintainable. Changeable by someone who didn’t write it.

Press enter or click to view image in full size

It comes down to structure:

  • Explicit boundaries. Every component declares what it is, what it exposes, where it ends. Not by convention — by enforced structure.
  • Declared dependencies. If A uses B, that relationship is visible, versioned, trackable. Not buried across 12 import statements.
  • Typed contracts. You know what goes in, what comes out, how it fails. Without reading the implementation.
  • Documentation that answers “why.” Not what the function does — why it exists, what problem it solves, what depends on it. The stuff that makes a new developer productive in minutes instead of days.
See Also  The Hype Around Signals — Smashing Magazine

Code without these gets more expensive to touch every single day. It starts on day two and it compounds.

What this looks like in practice

When Hope AI generates a platform, it doesn’t generate into a void. It generates into Bit.

So every component gets structural feedback immediately. Does this compile independently? not the whole app, just this piece? Are the dependencies declared? Do the tests pass in isolation? Is the public API typed? Does the documentation explain the why?

Source link

Similar Posts