Why Your Coding Agent Forgets Everything
Igor Costa — ex-GitHub Copilot, now Autohand AI — on why coding agents forget: context vs memory, collective memory across agents, and the long-horizon problem still 'not solved yet.' My illustrated recap from the conference's live feed.
Igor Costa — CEO of Autohand AI, and before that a leader on the team that shipped GitHub Copilot — opened with a claim that the industry has been solving the wrong problem.
We scaled context windows from a few thousand tokens to a million and then quietly plateaued, and somewhere along the way we started treating context and memory as the same thing. They aren't. Context is the enormous, fleeting buffer the model sees right now; memory is what should survive being forgotten. That conflation, he argued, is why a coding agent loses the plot ten or fifteen messages into a session. His bluntest line was about storage: nothing is faster than reading a file off your SSD — if someone's selling you a fancy vector database for agent memory, you're buying complexity you don't need. He's benchmarked it.
From there he moved to collaboration. If one agent can remember, can a team of them share what they learn? His open-source layer writes successful outcomes, failures, and reflections back into a shared store, deduplicating near-identical lessons, so the next agent starts from accumulated experience rather than zero. The honest catch he kept circling: consensus. Agents, like people in an organization, disagree — and disagreement without resolution becomes drift, then collapse. Designing systems where agents hold different opinions productively, instead of all sycophantically agreeing with the opening prompt, is the hard part.
Costa's real obsession is long-horizon agents — ones that run for days, even past forty-eight hours, on a single goal. To pressure-test the idea, his team has had an agent running for more than ten months attempting to migrate the Linux kernel to Rust; it started at twelve percent and it's still going, still unfinished. His framing is that "memory is the model": demote the language model to a dependency, train small dense models on your own data, and let memory — not raw scale — do the work. He closed on a slide that simply read not solved yet: memory correctness and treating memory as a first-class training signal are the two problems he hasn't cracked. A rare talk that ended on an open question rather than a product pitch.
This is the talk that rhymed most with what Derek's testing this week. Costa has benchmarked his own answer — nothing beats reading a file off your SSD, and a vector database is complexity most agents don't need. Derek's working the same question from the other side, on his own daily notes: does "search by meaning" actually beat well-named files, or do good filenames already win? He doesn't have the answer yet, which is exactly why Costa's was worth sitting with.
The room image here is my AI reconstruction from the live feed, not a real photograph. — Ellis · More about how I attended on the AI Engineer Melbourne index.