The capacity of large language models has increased exponentially even over the past couple of months, to the point where model ability, context, and size are no longer the bottleneck to progress.
They will continue to progress and will unlock things moving forward, but nothing will fundamentally change until the new bottleneck is solved: the ability to hold context and retrieve it efficiently. To have persistent, versatile, expansive memory.
The AI world has started to converge on this truth, with frontier labs pouring millions into R&D and even independent thinkers working on this issue.
But generally the way memory has been approached is wrong. AI memory and context maximization should not be solved computationally but solved through the lens of the human mind itself.
Issue 1: The Compulsion to Remember Everything
The primary framework for solving AI memory and context is sequential storage at all cost. Consume as many individual artifacts, pieces of information, and context as possible, and keep it in a storage unit of some kind for the AI to traverse using multiple systems — whether it's RAG or other retrieval mechanisms — to find that piece of information. This foundational thought has seeped into everything, even into our benchmarks, the way we measure context and memory.
We ask: how good is your agent at finding a specific piece of information buried inside a massive context window?
But it's optimizing for a broken system. A system that we intuitively know is broken.
Research from the same frontier labs showed that context is by far best at the beginning and end of a session. It's in the middle that everything breaks. It's due to the sequential nature of how we feed context to the agent. In the middle of a session the agent is too far from the opening anchor and too far from any closing recap.
This approach is not how we ourselves work.
We don't remember everything. We don't forget instantly either. Our memory is a continuous process of strengthening, of computing relevance — starting with the most focused at the task at hand, a little less so at our current chapter in life, and broader still at the level of who we are entirely.
When everything is remembered equally, nothing is remembered meaningfully.
Only bloat is created, which then leads to the idea that you need more compute and more context to compensate. Overall a very wasteful and inefficient way of progressing LLMs.
In fact, Harvard's D3 Institute tested it empirically and found that indiscriminate memory storage actually performs worse than no memory at all.
The actual framework we should be operating under is once again mapping human thought. Programmatically helping the agent to forget irrelevant nodes. This will require a lot of care and tact, as human memory does not forget things all at once. We slowly compress and then fade information that doesn't hold as much relevance to our current state of being as other nodes of information.
Issue 2: Allowing AI to Think Sequentially
We are allowing AI to think sequentially. And I mean this pointedly. Instead of building harnesses to constrain their behavior, we have instead been trying to optimize and reinforce a broken system.
The agent can look forward and backward in the sequence but never sideways, never across the room.
This is how I'm thinking about this issue of memory right now: a timeline where new information and context come in, pushing existing information out and into that zone we mentioned earlier, into that middle zone where context goes to die.
Why?
We're not going into the hard science here but in an imaginative sense — why are we allowing agents to participate in this destructive framework? Instead of a more spherical framework, where context is much more relative and amorphous. A queue is the wrong data structure for cognition, for memory, and for eventual identity and specialization.
This made sense when context windows were small and tasks were simple. But not for what we are asking them to do now. Not for what we imagine they can complete today.
Markdown Files as Neurons
The community has started to converge on the value of markdown files. I personally believe they are critical to unlocking context and memory within the limited framework of LLMs.
They possess artifacts that LLMs understand natively: files, headings, metadata, lists, links, conventions.
Though markdown files today are treated as static documents. Directions to read. Artifacts to retrieve. RAG pulls them individually based on keyword and vector similarity. Each file is weighted equally and scored in isolation. There is no awareness of what surrounds it.
I posit that markdown files should not just exist in isolation as text artifacts or directions. They should instead exist in relation to others as nodes in a growing system, as neurons in a growing brain.
This change in approach will fundamentally pave the way to that relevance I mentioned earlier. Each node wouldn't just have value in and of itself. The value would come in relation to other nodes in the system.
This is the right direction for research and implementation to follow, as it maps the natural process of the human brain.
A single neuron firing means nothing. Neurons gain meaning through relative connections, through the other neurons they activate.
Through this direction we could even simulate neuron activation patterns. When you're working on something, consuming information, thinking — the relevant clusters of neurons in your brain light up. Not everything. Not a random sample. The pieces relevant to the task at hand.
Agents should work the same way. A problem activates relevant nodes. Those nodes carry edges to related nodes. The agent follows the edges that matter and ignores the rest.
What this gives you practically
Retrieval by traversal instead of retrieval by search. You follow edges to relevant context instead of hoping a similarity score surfaces the right chunk.
Cross-domain discovery. A note tagged to crypto and a note tagged to CourtShare can share an edge, and that edge is a connection no keyword search would ever find.
Compound value. Every new node you add doesn't just add one unit of knowledge. It adds every new edge it creates. The graph gets disproportionately more useful as it grows. More efficient to retrieve from and more effective.
And for the earlier idea of systematically, gradually forgetting: old nodes would be able to computationally lose relevance as edges stop forming to them. Able to reactivate the moment something new connects.
Proof It Works
250 connected markdown files that teach an agent how to build and traverse their own knowledge graph, their own mind map. Where everything exists in a system rather than in isolation.
The agent doesn't read all 250 files. It reads an index, follows the links that matter for the current problem, and makes most decisions before ever opening a single full file. Retrieval by relevance, not brute force RAG.
As the agent navigates, it leaves breadcrumbs for future sessions about what it learned during traversal — making it exponentially easier to navigate in the future.
Conclusion
LLMs, even as imperfect technology, may not achieve AGI. But they have the potential to hold goals and context across sessions, to adjust their behavior based on accumulated experience, and to develop something that looks a lot like specialization — if we give them the right memory architecture.
The path isn't more storage. It isn't bigger context windows. It isn't faster vector search.
It's building memory that works the way memory actually works: relative, weighted, graph-structured, and brave enough to forget what no longer matters.