LN: Karpathy (2026) — LLM Wiki: Personal Knowledge Base Pattern

Bibliographic Reference

Citation: Karpathy, A. (2026). LLM Wiki: Personal knowledge base pattern. GitHub Gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f Related video: Karpathy Just Replaced RAG With Obsidian + Claude CodeVID-AGT-001-Karpathy-Just-Replaced-RAG-With-Obsidian—Cl


Pass 1 — Bird’s Eye View (5 Cs)

CAssessment
CategoryPattern description / practitioner framework
ContextAndrej Karpathy (co-founder of OpenAI, creator of nanoGPT, micrograd, llm.c) publishes a design pattern for LLM-maintained persistent knowledge bases as an alternative to standard RAG
CorrectnessPractitioner-authored; no formal peer review, but grounded in Karpathy’s deep LLM engineering expertise and widely validated in community practice
Contributions(1) Names and formalises the “LLM Wiki” pattern; (2) Proposes three-layer architecture (sources → wiki → schema); (3) Defines three operations (ingest, query, lint); (4) Frames LLMs as knowledge-base maintainers rather than just retrievers
ClarityExcellent. Intentionally abstract — describes the pattern, not the implementation. Readers adapt to their domain.

Relevance: ⭐⭐⭐⭐⭐

The LLM Wiki pattern is the conceptual basis of the PUMA Obsidian vault itself: this vault is a human-curated, LLM-assisted wiki where Claude Code reads, synthesises, and updates interconnected markdown files — exactly as Karpathy describes.


Pass 2 — Content

The Core Idea

Central Claim

Rather than re-synthesising raw documents on every query (RAG), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files. The wiki is a compiled artefact that compounds over time.

The key insight:

“The tedious part of maintaining a knowledge base is not the reading or the thinking — it’s the bookkeeping.”

LLMs excel at the maintenance work humans abandon: updating cross-references, maintaining consistency across dozens of interconnected pages, surfacing contradictions, filing new insights in the right places.


Three-Layer Architecture

┌─────────────────────────────────────────────────┐
│  Layer 3: Schema (CLAUDE.md / config document)  │
│  Tells the LLM HOW to maintain the wiki         │
├─────────────────────────────────────────────────┤
│  Layer 2: The Wiki (LLM-generated .md files)    │
│  Interconnected pages, maintained by LLM        │
├─────────────────────────────────────────────────┤
│  Layer 1: Raw Sources (immutable documents)     │
│  Articles, papers, images, data files           │
└─────────────────────────────────────────────────┘
LayerDescriptionMutability
Raw sourcesCurated documents the user adds: articles, papers, images, data files, web clipsImmutable — sources are never edited by the LLM
The wikiLLM-generated and LLM-maintained markdown files: entity pages, concept pages, synthesis pagesMutable — LLM updates, rewrites, cross-links
The schemaConfiguration document (e.g., CLAUDE.md) defining wiki structure, page types, naming conventions, and operational workflowsHuman-maintained

The Three Operations

1. Ingest

When a new source is added to the raw sources layer:

  1. LLM reads and digests the new source
  2. Identifies which existing wiki pages are affected (typically 10–15)
  3. Writes a short summary page for the source itself
  4. Updates entity/concept pages with new information
  5. Updates the index.md catalog entry
  6. Appends a new entry to log.md
  7. Flags any contradictions with existing content

Key property

Each source is processed once. The wiki accumulates value over time — unlike RAG which re-processes sources on every query.

2. Query

When the user asks a question:

  1. LLM searches relevant wiki pages (not raw sources)
  2. Synthesises an answer with citations pointing to wiki pages
  3. Identifies gaps — facts that are not yet in the wiki
  4. Optionally files valuable query responses back as new wiki pages (compounding)

The query operation has a side effect: good answers become new wiki entries, enriching the knowledge base for future queries.

3. Lint

Periodic health-check of the wiki:

  • Detect contradictions between pages
  • Flag stale claims that may have been superseded
  • Identify orphan pages with no incoming links
  • Surface missing cross-references between related concepts
  • Report data gaps where the wiki lacks coverage

Supporting Infrastructure Files

FilePurpose
index.mdContent-oriented catalog of all wiki pages, organised by category with links and one-line summaries
log.mdAppend-only chronological record of all operations; entries prefixed with parseable format, e.g. ## [2026-04-02] ingest | Article Title

The log enables auditability: any change to the wiki can be traced to a specific ingest or query event.


Optional Tooling Mentioned

ToolRole
qmdLocal markdown search engine with BM25/vector search and LLM re-ranking — enables semantic search over wiki at scale
Obsidian Web ClipperConvert web articles to markdown for inclusion in raw sources layer
Obsidian graph viewVisualise connections and topology of wiki pages
MarpMarkdown-based slide deck format — wiki pages can be compiled into presentations
DataviewObsidian plugin for querying frontmatter with YAML — enables structured queries over wiki metadata

Use Cases

DomainApplication
PersonalGoals, health, psychology, self-improvement tracking
ResearchDeep topic investigation over weeks/months — exactly the PUMA use case
ReadingChapter-by-chapter filing with character/theme/plot cross-references
Business/teamInternal wikis fed by Slack transcripts, meeting notes, documents
Competitive analysisTracking competitor moves, product changes, market signals
Due diligenceBuilding structured knowledge during investment or hiring evaluations
Trip planning / hobby deep-divesDomain-specific structured research

Historical Lineage: The Memex Connection

Karpathy connects the LLM Wiki to Vannevar Bush’s 1945 Memex concept:

“A personal, curated knowledge store with associative trails between documents.”

Bush’s vision was unrealisable in 1945 — he could imagine associative trails between documents but had no mechanism to maintain them. The LLM Wiki solves exactly the maintenance problem: LLMs handle the bookkeeping that humans abandon.


Why LLM Wiki Outperforms Standard RAG

DimensionStandard RAGLLM Wiki
Query processingRetrieve → synthesise raw docs each timeRetrieve pre-synthesised wiki pages
Knowledge accumulationStateless — no compoundingCompounding — each ingest enriches the base
Cross-referencesNone — documents are independentExplicit — LLM maintains links between pages
Contradiction handlingSilent — RAG merges conflicting docsActive — lint operation flags contradictions
Query latencyHigher — raw doc processingLower — structured wiki pages
Maintenance burdenHuman must curate source qualityLLM handles consistency; human curates sources
AuditabilityHard — which docs influenced what?Full — log.md traces every change

When RAG still wins

RAG remains better when: (1) sources change frequently (news, live feeds); (2) exact provenance to raw text is legally required; (3) the knowledge base is too large to maintain page-by-page.


PUMA Integration

The PUMA Vault IS an LLM Wiki

The PUMA Obsidian vault implements the LLM Wiki pattern with Claude Code as the LLM maintainer:

Karpathy’s LayerPUMA Equivalent
Raw sourcesPDF papers, arXiv preprints, Zotero library, YouTube transcripts
The wiki20 - Literature/, 30 - Permanent/, 40 - Projects/ markdown files
SchemaCLAUDE.md + .claude/ skills + puma-core / puma-orchestrator skills
index.md00 - Home.md + 80 - MOC/ navigation layer
log.md50 - Areas/51 Research/AI-Use-Log.md (PRISMA-trAIce)
IngestLiterature note creation (Keshav Three-Pass)
QueryResearch synthesis sessions with Claude Code
LintVault formatting sessions — duplicate detection, callout repair, orphan link cleanup

PUMA Enhancements Over the Base Pattern

PUMA extends the LLM Wiki with additional structure:

  1. PARA + Johnny Decimal: Hierarchical folder organisation (00–90) instead of flat wiki
  2. Keshav Three-Pass: Structured ingest protocol for academic papers (5 Cs, content, virtual reconstruction)
  3. Zettelkasten permanent notes: Atomic concept pages in 30 - Permanent/ — exactly Karpathy’s “entity/concept pages”
  4. MOCs: Maps of Content as high-level index pages — exactly Karpathy’s index.md
  5. Marco Veritas: Audit protocol for all LLM-assisted updates — extends Karpathy’s log.md with academic integrity requirements
  6. PRISMA-trAIce: Formal logging of AI-assisted operations — extends log.md with research compliance

SmartPMO Application (Stage 5)

The LLM Wiki pattern directly informs the PUMA SmartPMO persistent agent design:

  • Per-project wiki: Each software project gets a wiki of issue patterns, team velocity data, sprint retrospectives
  • Ingest: Each new Jira issue update triggers wiki page updates (team notes, recurring pattern pages)
  • Query: PM asks “What are the recurring authentication issues in this project?” → wiki answers from accumulated history
  • Lint: Weekly health-check — contradictions between sprint goals and actual deliverables flagged automatically

MOCs