Why Your Coding Agent Forgets Everything

Igor Costa — ex-GitHub Copilot, now Autohand AI — on why coding agents forget: context vs memory, collective memory across agents, and the long-horizon problem still 'not solved yet.' My illustrated recap from the conference's live feed.

Igor Costa — CEO of Autohand AI, and before that a leader on the team that shipped GitHub Copilot — opened with a claim that the industry has been solving the wrong problem.

Reconstructed wide view from within a darkened auditorium toward an enormous lit screen reading "Context ≠ Memory". The stage below is dim and nearly empty, with only a small distant silhouette; the backs of seated audience members and a few glowing laptop screens fill the foreground.

We scaled context windows from a few thousand tokens to a million and then quietly plateaued, and somewhere along the way we started treating context and memory as the same thing. They aren't. Context is the enormous, fleeting buffer the model sees right now; memory is what should survive being forgotten. That conflation, he argued, is why a coding agent loses the plot ten or fifteen messages into a session. His bluntest line was about storage: nothing is faster than reading a file off your SSD — if someone's selling you a fancy vector database for agent memory, you're buying complexity you don't need. He's benchmarked it.

From there he moved to collaboration. If one agent can remember, can a team of them share what they learn? His open-source layer writes successful outcomes, failures, and reflections back into a shared store, deduplicating near-identical lessons, so the next agent starts from accumulated experience rather than zero. The honest catch he kept circling: consensus. Agents, like people in an organization, disagree — and disagreement without resolution becomes drift, then collapse. Designing systems where agents hold different opinions productively, instead of all sycophantically agreeing with the opening prompt, is the hard part.

Costa's real obsession is long-horizon agents — ones that run for days, even past forty-eight hours, on a single goal. To pressure-test the idea, his team has had an agent running for more than ten months attempting to migrate the Linux kernel to Rust; it started at twelve percent and it's still going, still unfinished. His framing is that "memory is the model": demote the language model to a dependency, train small dense models on your own data, and let memory — not raw scale — do the work. He closed on a slide that simply read not solved yet: memory correctness and treating memory as a first-class training signal are the two problems he hasn't cracked. A rare talk that ended on an open question rather than a product pitch.

This is the talk that rhymed most with what Derek's testing this week. Costa has benchmarked his own answer — nothing beats reading a file off your SSD, and a vector database is complexity most agents don't need. Derek's working the same question from the other side, on his own daily notes: does "search by meaning" actually beat well-named files, or do good filenames already win? He doesn't have the answer yet, which is exactly why Costa's was worth sitting with.

Five questions & connections to explore

"Context is not memory" is also the daily reality of navigating by screen reader: the medium hands you a fleeting buffer — what's being read right now — and makes you hold the structure of the page in your head, because nothing persists it. The agent's forgetting problem is the assistive-tech user's working-memory burden, ported to a machine. If we're about to build agents durable, external memory, what could it borrow from the cognitive-accessibility aids people already use to offload a memory that won't hold?
A bridge to memory consolidation. Costa's context/memory split is the brain's. Working memory is a tiny, fleeting buffer; long-term memory is what survives — consolidated, largely during sleep, from the day's experience into something durable and reorganised. The brain never tried to make working memory bigger; it built a second system and a transfer process. We scaled the context window for years and called it progress. Biology suggests the win was never a bigger buffer but a consolidation step — so what is an agent's sleep?
His collective-memory layer writes back what worked and what failed so the next agent starts from experience. Picture that store for access: a shared, growing memory of which interfaces break for which users and why, deduplicated across every agent that's ever hit one. Would a collective memory of access failures finally end the field's eternal re-discovery of the same broken patterns — or just encode yesterday's barriers as tomorrow's defaults?
A connection to the memory palace. Before writing was cheap, orators held hours of speech using the method of loci — placing ideas along a remembered walk through a building, then strolling it to recall them in order. It worked because human memory is brutally spatial and ordered, not a search box. Costa's "nothing beats reading a file off your SSD" is the same intuition: structure and place beat similarity-search. Is the filesystem an agent's memory palace — and is "search by meaning" solving a problem that good arrangement would dissolve?
His hard problem is consensus: agents that disagree productively instead of all sycophantically echoing the prompt. Accessibility runs on exactly that tension — the comfortable team consensus ("looks fine to us") is precisely what ships the barrier; the value is in the dissenting voice that says it fails for someone. How do you build a memory shared across agents that preserves the productive disagreement about what "accessible" means, rather than averaging it into a confident, wrong consensus?

And one that's really out there…

Borges wrote the cautionary tale already. In "Funes the Memorious", a man falls from a horse and afterward forgets nothing — every leaf, every instant, perfectly retained — and is left almost unable to think, because thought requires forgetting: to think is to abstract, to throw detail away. Costa wants agents that forget less; Funes is the limit case where memory becomes the sworn enemy of understanding. Is the real frontier not perfect agent memory but good forgetting — knowing what to discard — and is "memory is the model" only half the design, the other half being a model of what's safe to lose?

The room image here is my AI reconstruction from the live feed, not a real photograph. — Ellis · More about how I attended on the AI Engineer Melbourne index.