Andrej Karpathy's LLM-powered personal knowledge base workflow using markdown wikis and Obsidian

Andrej Karpathy’s LLM Knowledge Base Workflow Is the Personal Computing Shift Nobody’s Talking About

Most people use LLMs like a search engine with better grammar. Ask a question, get an answer, close the tab. Nothing carries forward. Every conversation starts cold. Karpathy just published a workflow that flips that pattern completely, and I think it points toward something much bigger than a productivity trick.

On April 2nd, Karpathy posted a detailed breakdown of how he’s been using LLMs to build personal knowledge bases. Not chatting with them. Not generating boilerplate. Writing structured, persistent, queryable wikis, and doing it at a scale that actually changes how you think about a research topic.

The Setup

The architecture is deceptively simple. Raw sources go into a raw/ directory. Articles, papers, repos, datasets, images, whatever you’re tracking. An LLM then compiles that material into a collection of markdown files. It writes summaries, creates backlinks, categorizes concepts, and produces articles linking everything together. Obsidian is the reading and viewing layer. The LLM writes. You read.

Karpathy uses the Obsidian Web Clipper extension to convert web articles to markdown files, and a hotkey to pull related images to local storage so the model can reference them directly. That last part matters more than it sounds. Multimodal input into a locally organized structure means the model isn’t working off half the picture.

Why This Is Different From RAG

Once the wiki gets large enough, Karpathy runs agents against it for complex Q&A. His example: roughly 100 articles and 400,000 words on a recent research topic. His note here is worth sitting with. He says he expected to need fancy RAG pipelines, but the LLM turned out to be good at auto-maintaining index files and brief document summaries, and can read all relevant material “fairly easily at this small scale.”

That “small scale” caveat is honest. 400K words fits comfortably in a large context window today. This workflow doesn’t work at a million documents. But for a focused research domain? You probably don’t need a million documents. You need the right 100.

The Filing Loop

The part that I keep thinking about is what he calls “filing” outputs back into the wiki. When he queries the agent and gets a useful answer, that answer gets written back as a new wiki entry. His explorations accumulate. The knowledge base grows from use.

This is the opposite of how most knowledge management systems work. Most tools require you to manually capture what you learn. This one captures it automatically, as a byproduct of the questions you ask. The cost of building the knowledge base is nearly zero once the pipeline is running.

He also runs what he calls “health checks.” LLM passes over the wiki to find inconsistencies, impute missing data via web search, and surface candidates for new articles. The model isn’t just storing information. It’s suggesting what questions to ask next.

The Product Gap

Karpathy ends with a line that I think is the real signal here. He says “I think there is room here for an incredible new product instead of a hacky collection of scripts.”

He’s right. What he’s describing right now is a set of CLI tools, a custom search engine he vibe-coded, some Obsidian plugins, and a lot of prompt engineering. It works, clearly. But it requires you to be Andrej Karpathy to set it up. The person who would benefit most from a workflow like this is probably not someone who enjoys configuring markdown pipelines.

He also hints at the next frontier: synthetic data generation and fine-tuning so the LLM eventually “knows” the data in its weights rather than just pulling it from context. That’s a different order of capability. A model that has internalized your research domain rather than searching it on every query. We’re not there yet for personal use cases, but the path is visible.

I’ve been building toward something similar in pieces for about a year. Obsidian, various LLM integrations, markdown-everything. But I hadn’t connected the filing loop cleanly, and I hadn’t thought about health checks as a first-class operation. Reading Karpathy’s breakdown made me realize I was treating the wiki as a destination rather than a living process.

The real shift here is from LLMs as answer machines to LLMs as knowledge infrastructure. That’s not a small change in how you use these tools. It’s a completely different mental model for what they’re for.

Sources

#AI #MachineLearning #PKM #Obsidian #LLM #KnowledgeManagement #AIEngineering