Patient Zero is Coding
In which we find that a flatline can also go straight up.
Around the turn of the year, engineers started saying something had changed. Not in press releases or product announcements but in the places where engineers actually talk, in the threads and the slacks and discords and the offhand comments in pull request reviews, the kind of distributed murmur that precedes a consensus no one in particular planned. The coding agents were different now. They were collaborators. They had, someone said, and then everyone said, a certain coherence to them, a tenacity, something that felt less like autocomplete at scale and more like a junior engineer who had finally stopped annoying the team as much.
Karpathy called it a phase change but what had changed phase was not the model. The model that shipped in October was the model running in December. What had changed was what the model was reading before it started working, the accumulated operational judgment of the engineering community, version-controlled, progressively disclosed, injected at the moment of need with enough precision that the agent was no longer operating on generic priors alone but on the specific knowledge of how this team deploys, what this codebase punishes, which abstractions have already been tried and abandoned, and why, or to a first approximation; all of it held in a folder which had been filling up since October. By December there were enough folders, with enough SKILL.md files written by enough engineers who had finally stopped writing letters to their models, that the effect was legible at the level of feel.
Agents had crossed from brittle demo behavior into sustained, long-horizon task completion; that they could hold a coherent thread across a complex engineering problem without losing the plot, without hallucinating a function that didn’t exist, without confidently refactoring the wrong abstraction because the context window had forgotten what the right one was. Everyone agreed the room had suddenly gotten brighter—as if someone had flipped a switch—but there was a real struggle to name what it was. What changed phase was the context itself: now it was a folder, loaded in at a different point in the lifecycle, a very sophisticated, very carefully versioned, peer-reviewed system for getting the right instructions into the agent’s context at the right moment, which is a description that would have sounded familiar to anyone who had spent time thinking about how agents get manipulated rather than how they get skilled, but that community and this community had not yet introduced themselves. Security was still an afterthought.
The folder explosion is the long-range dependency problem, at scale, proliferating into the file system because the architecture layer hasn’t fully solved it yet.
This had a shape that anyone who had been paying attention since 1997 would recognize. Which is probably no one reading this, unless someone sends it to Schmidhuber because that was a very long time ago and it was literally still AI winter and Christ was still a corporal and it was still early even for dotcom (Pets.com wouldn’t go live until the next year.) But that was when Hochreiter and Schmidhuber formalised the long-range dependency problem,¹ and then the field spent the next two decades trying to solve it. How do you keep what happened early in a sequence relevant to what happens late in it? RNNs couldn’t; the gradient vanished across distance, the model forgot. LSTMs gated the gradient and got further. Stacking them got you further still. GRUs simplified the gate. Then Vaswani et al dropped attention in 2017 and distance collapsed, every token attending to every other token simultaneously, and the architecture layer appeared to have solved it.
Except it hadn’t, of course. Attention solved distance but not scale. The cost was quadratic. Context windows stayed small. The model could hold a paragraph but not a codebase. GitHub Copilot launched in June 2021 and immediately ran into the long-range dependency problem at the application layer. Function calling in June 2023 externalized the capability so the model didn’t have to remember it. Assistants API built a catalog. Code Interpreter put execution in the sandbox. MCP built the registry. Each one a different attempt to solve the same problem, and each one, in its own way, purporting to eliminate the dependency and instead relocating it, from weights to gates to attention to tools to registries to folders.
The registry bloat problem *is* the long-range dependency problem. The context suicide that progressive disclosure was designed to solve is the long-range dependency problem. The folder explosion is the long-range dependency problem, at the application layer, proliferating into the file system because the architecture layer hasn’t fully solved it yet. Each new MCP tool added to the registry is a new token in a sequence that the model must attend to. The folder is a hand-crafted hidden state. Progressive disclosure is manual attention. The three-tier loading system is a gate. Schmidhuber would recognize it immediately. (Well, that goes without saying.) As the registry grows, the gradient of the user’s original intent vanishes: SKILL.md file acts as the Input Gate, protecting the model from irrelevant tools; PLANS.md file is the Cell State, carrying the long-term goal across hundreds of individual turns. We didn’t solve the architecture; we moved the gates from the GPU kernels to the ls command.
The phase change Karpathy identified was not just the latest salvo in the ongoing war to tame long range dependencies at massively exploding scales, but reminiscent of a moment from the post-NLP, pre-GPT era when ULMFiT achieved state-of-the-art results on text classification with very little labeled data by using a pretrained language model that captured polysemy, anaphora, and long-term dependencies, which Sebastian Ruder referred to as “NLP's ImageNet moment,”² harkening back to an even earlier watershed when another sea change was astonishingly palpable.
December 2025 had the same feel. What had arrived was a repository with latent awareness of what it might be called upon to do. You stopped starting from scratch. You started from somewhere. The benchmarks moved and Karpathy wrote it down and four and a half million people recognized what he was describing because they had seen the blue flash.
When Boris Cherny posted his Claude Code statistics for the year the post got passed around as a kind of ersatz benchmark. But it wasn’t a controlled experiment. It was a senior engineer showing his work, and what the work showed was that the phase change Karpathy had named was not a feeling but a practice, not a vibe but a methodology, not a collaboration in the soft sense of the word but in the precise sense: a human and a system dividing the labor of a year’s worth of engineering work along lines that would have been inconceivable before. Scott Wu said it plainly: our engineers don’t type code anymore. That’s just reality. That he was not speaking hyperbolically was perhaps somewhat lost, for all the hype, but he was describing a practice that had become unremarkable inside teams that have been running agents against production systems long enough for the novelty to wear off and the methodology to settle in.
What nobody was asking, in the threads and the discords and the pull request comments and the 4.4 million impressions of Cherny’s statistics, was where the complexity had gone. The folder had made something invisible, but invisible is not the same as absent. The judgment that the SKILL.md files were carrying came from somewhere, cost something, had been developed through years of specific decisions made under specific pressures that the letter to the model had always been trying to compress and never quite could. The folder didn’t solve that problem. It deferred it, elegantly and usefully and with version control, to the moment when the engineer sat down to write the skill and discovered how much of what they knew they could not yet explain.
That cost was not showing up in Cherny’s token count. It was not showing up in Karpathy’s phase change. It was not showing up in the 4.4 million impressions or the 259 pull requests or the 497 commits or all of Quixote’s tokenized windmills.
The folder solved the long-range dependency problem the same way LSTMs did, well enough to produce a phase change, not well enough to stop asking the question, because the complexity the folder made invisible had not been eliminated. It had been relocated. Not metaphorically, but mechanically, from the model’s internal state into a version-controlled external memory, from implicit weight to explicit artifact, from inference-time burden to author-time burden. Which is not to say the phase change was not real. It was very real, the way ImageNet was real and Universal Language Model Fine-Tuning was real and its coherence was real and the tenacity was real and Cherny’s 625 Cervantes of tokens were real, and none of that made the complexity smaller. It made it less visible, which is a different condition, and the difference between those two conditions is where things inevitably go sideways when complexity is relocated and you didn’t write down its forwarding address
Notes
[ 1 ] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.


