All research
AI & Automation

Your Codebase Has Amnesia

Dan M 18 March 2026 12 min read

Every AI agent session starts from zero. The architectural decisions from six months ago, the module boundaries drawn for specific reasons, the bug fixed three times in three different ways. We examined how knowledge loss compounds across agent sessions and what it costs.

The session boundary problem

Every AI agent session starts with a clean slate. The context window fills with whatever files the agent retrieves or the developer provides. Then the work happens, the code is generated, the PR merges, and the session ends. Everything the agent learned during that session, the false starts, the edge cases considered and rejected, the reasons behind specific choices, vanishes.

The next session starts from zero again.

This isn’t a bug. It’s how these systems work. But it has consequences that most engineering teams haven’t grasped yet, because the cost doesn’t show up immediately. It shows up in the fourth month. The seventh. The tenth. It shows up as a codebase that can’t remember its own history.

We examined eight codebases across six organisations, all actively using AI coding agents, to understand what knowledge loss looks like in practice and how it compounds over time.

What institutional knowledge actually means in code

Before we get into findings, we need to define the term. “Institutional knowledge” sounds abstract. In a codebase, it’s concrete. It’s the answer to questions like:

Why does the authentication module handle tokens this way instead of the simpler approach? (Because the simpler approach broke under load during the November 2024 incident.)

Why is there a 50ms delay before the retry logic kicks in? (Because the downstream payment provider’s API returns false negatives for about 30ms after a timeout, and retrying immediately causes duplicate charges.)

Why does the user service have its own cache layer instead of using the shared cache? (Because the shared cache had a consistency bug with the user data model that was never fixed at the infrastructure level, so the team worked around it.)

These aren’t documented in most codebases. They live in commit messages (sometimes), in Slack threads (more often), in the heads of engineers who were there when the decision was made (most reliably). When a human developer encounters one of these patterns, they ask. They check the git log. They ping the person whose name shows up in git blame. The knowledge transfers, imperfectly but persistently.

An AI agent does none of this. It sees the current state of the code. It makes inferences based on patterns in its training data and whatever context it’s been given. If the reason for a design choice isn’t visible in the files the agent can access, that reason doesn’t exist for the purpose of that session.

The eight codebases

We studied eight codebases ranging from 60,000 to 400,000 lines of code. All were actively maintained by teams of 6 to 25 engineers. All had adopted AI coding agents between 4 and 11 months before our analysis. We conducted structured interviews with 34 engineers across these teams and performed detailed code archaeology on the repositories themselves.

What we found was consistent enough to be concerning.

Finding 1: The same bug, fixed three different ways

In six of eight codebases, we found instances where the same underlying bug was fixed multiple times in different ways. Not the same surface-level bug, but the same root cause producing symptoms in different locations, with each agent session treating it as a novel problem.

The most striking example was in a logistics platform. A race condition in their order processing pipeline could cause duplicate webhook deliveries under specific timing conditions. Over five months, three separate agent sessions encountered symptoms of this bug:

Session one: the agent added a deduplication check at the webhook delivery layer. Correct locally. Addressed the symptom.

Session two, six weeks later: the agent added a database uniqueness constraint on webhook delivery IDs in a different module that also received duplicate events. Also correct locally. Did not reference or build on the first fix.

Session three, two months after that: the agent added idempotency keys to the API layer. A third correct, local solution to the same root problem.

The root cause, a missing lock in the order processing pipeline, was never addressed. Three engineers, working with three agent sessions at different times, each solved a symptom. None of them knew about the other fixes. The agent certainly didn’t.

The result: three layers of defensive code protecting against a bug that could have been fixed with a four-line change in the pipeline itself. The combined maintenance cost of those three workarounds was significantly higher than the actual fix would have been. And the root cause remained.

Finding 2: Contradictory architectural patterns

In every codebase we examined, we found modules built weeks or months apart that took fundamentally different architectural approaches to similar problems. This goes beyond normal stylistic variation. These were cases where the approach in module A was specifically incompatible with the approach in module B.

One organisation’s API layer contained two distinct patterns for request validation. Modules built before August 2025 used middleware-based validation with a shared schema registry. Modules built after August used inline validation with per-endpoint schemas. There was no architectural decision record for the change. No team discussion. No deliberate migration.

What happened was simpler than that. The senior engineer who had championed the middleware approach went on parental leave. The agents working on new modules didn’t have sufficient context about the existing pattern. The middleware approach required understanding a custom schema registry that wasn’t well-documented. The inline approach was more straightforward and matched common patterns in the agent’s training data.

By the time the senior engineer returned, four new modules used the inline pattern. The codebase now had two parallel validation architectures. Migrating either direction would take weeks. Neither approach was wrong. They just couldn’t coexist cleanly, and the integration points between old and new modules became a persistent source of bugs.

Finding 3: Naming conventions that drift

This one seems minor. It isn’t.

We tracked naming conventions for similar concepts across modules built at different times. In one codebase, the concept of a user’s subscription state was referred to as subscriptionStatus, subState, userPlanStatus, planState, and subscription_tier across different modules. All built within a seven-month window. All referring to the same underlying concept.

Five names for one thing. Every new module that needed to interact with subscription data had to figure out which name was canonical. The agents generating new code would pick whichever name appeared in the files they had context on, which depended on which modules the developer included in the prompt.

The cost isn’t just confusion. It’s query failures, mapping errors, and integration bugs that happen because someone passed subState where the function expected subscriptionStatus. We counted 23 bugs in this codebase that traced directly to naming inconsistency. Twenty-three.

Finding 4: Documentation doesn’t solve it

The obvious response to knowledge loss is documentation. Write it down. Maintain an architecture wiki. Keep decision records.

We examined the documentation practices of all eight teams. Four had active architecture decision records (ADRs). Two had wiki pages describing key patterns. Six had README files in major modules. All of them were incomplete, and the gap between documentation and actual codebase state widened after agent adoption.

The reasons are mechanical. Documentation is written by humans at a point in time. It reflects the codebase as it existed when someone sat down to write it. Agent sessions change the codebase rapidly, often in ways that don’t trigger a documentation update because the change seems minor. A new error handling pattern in one module doesn’t feel like it warrants an ADR update. But after ten such changes across ten modules, the ADR no longer describes the system.

We found that ADR accuracy (how well the documented patterns matched actual codebase patterns) dropped from 78% to 41% in organisations that had been using agents for more than six months. The documentation existed. It just described a different codebase than the one that was running in production.

README files fared worse. In two codebases, module-level READMEs contained setup instructions that no longer worked because agent sessions had changed configuration approaches without updating the documentation.

The problem with documentation as a solution to knowledge loss is that documentation is itself a static artefact. It captures knowledge at a point in time. It doesn’t update itself when the codebase evolves. And the faster the codebase evolves (which is exactly what agents enable), the faster documentation goes stale.

The compounding cost

Knowledge loss in a codebase is not linear. It compounds.

In month one of agent adoption, the knowledge gaps are small. A missed convention here, an undocumented decision there. The codebase still mostly coheres because human developers still hold most of the context and can course-correct.

By month four, the gaps are larger. Multiple patterns have emerged for the same problems. New engineers joining the team can’t tell which pattern is canonical by reading the code, because there isn’t a canonical pattern anymore. They ask the agent, and the agent picks whichever one it finds first.

By month eight, the codebase has developed what one of our interviewees called “geological layers.” You can date when a section of code was written by which patterns it follows. The database access code from Q1 looks fundamentally different from Q3 code. Both work. Integrating them requires understanding both approaches and knowing which one a given module expects.

We estimated the cost of knowledge loss compounding across our eight codebases. The median team was spending 22% of their engineering capacity on work that was directly attributable to knowledge fragmentation: diagnosing bugs caused by inconsistency, reconciling contradictory patterns, onboarding engineers to a codebase that no longer had a single coherent style.

That 22% doesn’t show up in any dashboard. It shows up as sprints that take longer than estimated, as bugs that are hard to reproduce, as new team members taking twice as long to become productive.

What would a solution look like?

We’re not going to pretend we have a complete answer. But our research points to a specific gap that existing tools don’t fill.

What’s missing is a persistent, queryable knowledge layer that sits between the codebase and the agent. Not documentation (which is static and goes stale). Not the code itself (which shows what but not why). Something that captures architectural intent, records decisions and their rationale, tracks patterns and their boundaries, and updates as the codebase evolves.

This layer would need to do several things:

It would need to be accessible to agents during sessions, providing context about why the codebase is structured the way it is. When an agent is about to introduce a new error handling pattern, the knowledge layer should surface: “This codebase uses exponential backoff with circuit breakers for retry logic. Here’s why. Here’s where.”

It would need to update based on what actually happens in the codebase, not just what someone writes in a wiki. If a new pattern is introduced, the knowledge layer should flag the divergence, not silently let it accumulate.

It would need to preserve the reasoning behind decisions, not just the decisions themselves. “We use pattern X” is less useful than “We use pattern X because pattern Y caused a production incident on this date, and the cost of pattern Z’s complexity wasn’t justified for our scale.”

This is harder to build than a wiki or an ADR template. It requires understanding code structure, tracking evolution over time, and making knowledge available in the right format at the right moment. But without it, every agent session starts from zero. And the cost of starting from zero, repeated hundreds of times across a growing codebase, is a system that can’t remember what it knows.

The question isn’t whether your codebase will lose knowledge. It’s how fast, and whether you’ll notice before the compound interest becomes unmanageable.

Where this leads

The stale documentation problem is old. Engineers have been complaining about outdated wikis since wikis existed. What’s different now is the rate. AI agents accelerate the rate at which code changes and the rate at which knowledge fragments. A problem that used to take years to become painful now takes months.

Engineering organisations that recognise this early have a real advantage. The tooling to fully solve it doesn’t exist yet, but the first step is measurement regardless. Know what patterns your codebase follows. Track whether those patterns are holding. Watch for the signs of fragmentation: duplicate implementations, contradictory approaches, naming drift, bugs that exist because different parts of the system were built with different assumptions.

The codebase has amnesia. The velocity metrics say everything is fine. Both of these things are true simultaneously, and the gap between them is where the real cost lives.