Agents Need Memory, Not Just More Context
We keep giving agents more context, when what they need is memory.
Not memory inside the model. Not a little personality file that remembers your preferred indentation and your deep fondness for complaining about Jira. I mean project context structured so it behaves like memory: history with time, provenance, ownership, and enough shape for an agent to understand why something mattered.
Agents are already in the software workflow. They write code, review PRs, answer system questions, summarise incidents, generate docs, and help new engineers find their bearings.
But most of the time, we give them a snapshot.
A senior engineer does not just know what the code says today. They remember how the system got there.
That is the history agents usually lack.
The Snapshot Problem
Most documentation explains the current system. That is useful. It is also compressed reality.
It tells you the conclusion and loses the path.
A doc might say that a feature interacts with billing in a particular way. It may not say that the interaction exists because an earlier approach caused a reconciliation incident, or that the current design only made sense under constraints that were true at the time.
The snapshot is not wrong. It has thrown away the reasoning.
This is why bigger context windows only get you so far. They let an agent see more text, but they do not automatically tell it what mattered, what changed, what was superseded, or which old decision is still carrying pain through the system.
More context can still be the wrong shape.
The Work Is The Memory
Most important engineering knowledge is not born as documentation. It appears in the work: PRs, review comments, Slack threads, incidents, design discussions, debugging sessions, abandoned branches.
That is where assumptions get challenged, bad approaches get ruled out, ownership becomes visible, and the difference between “we chose this” and “we got stuck with this” becomes clear.
This material is messy.
So is your memory.
Real engineering judgement rarely arrives as a tidy document. It usually shows up halfway through the work, when someone notices that an apparently reasonable approach has a nasty edge case.
Those moments are easy to lose. Then new people rediscover them. Agents do too.
But storing traces is not enough. A Slack export, PR archive, or embeddings database is not automatically memory. It is stored material.
Semantic search helps, but similarity is not memory.
An agent also needs the links around the trace: what else was happening, which parts of the system were involved, who had the context, which assumption was corrected, and what later replaced or invalidated the old understanding.
Otherwise it can retrieve facts while missing the judgement behind them.
That is how you get an agent quoting the right sentence and still making the wrong call.
Memory Preserves State
This matters inside a single debugging session too.
A useful agent should not only know the codebase. It should know the investigation state.
What failed? What changed? Which command was retried? Which hypothesis got ruled out? Which fix made things worse?
Without that state, agents loop. They rediscover the same gotchas, suggest the same plausible fixes, and lose the thread between attempts.
This is also why handoffs are painful. Between one engineer and another, one sprint and the next, one agent session and the next, the current snapshot is rarely enough.
The handoff needs to preserve what was live, what was tried, what was learned, and what should not be trusted anymore.
Handoffs are where missing memory becomes visible.
Learning In Layers
I have started changing my own workflow around this.
When I work through a problem with an agent, I add to the project memory when we reach a new level of understanding: a sharper frame, a corrected assumption, a better abstraction, or a decision I know I will want to recover later.
That is when I say: “Take a checkpoint here.”
It does not replace what came before. It adds another layer.
Over time, these checkpoints become a trail of how understanding developed: what I believed, what corrected it, which constraints appeared, and which decisions became obsolete.
In a small way, this embeds my own learning into what the agent can retrieve later. Not just the conclusion, but the mistakes that taught me how to reach it.
Software teams need the same thing at project scale.
Where It Pays Off
The point is not better search over old messages. The point is making past engineering judgement reusable where teams actually need it.
Imagine an incident touches three workstreams. The answer you want later is not just one incident summary. You want a slice through the project at that point in time: what was happening, which decisions were live, who was involved, what later changed, and what the team learned.
Once you have that foundation, the same substrate supports several workflows.
Documentation gets better because an agent can explain a feature from the history that produced it, not just from the final state of the code.
Code review gets sharper because the agent can recognise when a change touches a painful area and who has the context to review it.
Onboarding gets more grounded because a new teammate can ask what matters before touching a feature and get the answer a senior engineer would normally have to explain.
You do not need separate memory for docs, review, onboarding, and incidents. You need the team’s experience captured in a way the agent can retrieve and interpret for the task at hand.
Engineering Memory Infrastructure
This is not mainly about making agents more autonomous.
Autonomy is the flashy bit. Reliability is the useful bit.
A memory-aware agent should be more grounded and easier to challenge. It should be able to show what it is relying on: the trace, the decision, the incident, the owner, the newer understanding that superseded the older one.
Making an agent sound confident is not the hard part.
The hard part is giving it the right evidence to be confident about.
Treating engineering memory seriously means treating it as infrastructure: source ingestion, time-aware indexing, provenance, ownership, freshness, permissions, and a way to tell when one piece of understanding has been superseded by another.
The exact shape will differ by team. A small team might start with disciplined markdown traces and decision records. A larger organisation might need deeper indexing across PRs, incidents, docs, chat, and code ownership.
But the requirement is the same.
Without structure, the agent can retrieve text, but it cannot tell what the text means in the life of the project.
The snapshot tells you what the system does.
The accumulated history tells you why.
Engineering memory infrastructure is how we give agents access to both.