知識管理終極系統：點解你嘅筆記 app 永遠唔夠用

🧠 你有冇試過：Obsidian 裝咗三十個 plugin、Notion 開咗五個 workspace、Apple Notes 存咗幾百條 quick note，但每次想搵一條三個月前嘅 insight，都搵唔到？

問題唔係你唔夠勤力，唔係工具唔夠好。問題係你嘅知識管理系統缺咗一層：schema。

呢篇講三件事：點解 Karpathy 嘅 LLM Wiki 係目前最合理嘅 knowledge management architecture、file-based memory 點解贏 vector search、同 daily digest 點樣將「我有好多 note」變成「我知道邊條 note 而家 relevant」。

你嘅筆記 app 嘅真正問題

大多數人嘅知識管理流程係：

見到有用嘅嘢 → 複製貼上 → 放入某個 folder → 再也冇打開過

呢個流程嘅致命缺陷唔係「冇整理」，係冇 retrieval strategy。你存咗一千條 note，但你嘅 retrieval method 只有兩個：記得關鍵字搜索、或者碰巧 scroll 到。

Obsidian 嘅 backlink 試圖解決呢個問題，但 backlink 需要你手動建立連結——而你存嘅嗰一刻通常唔知佢同咩有關。Notion database 試圖用 property filter 解決，但你要預先定義 property schema——而大部分人嘅 schema 喺第三日就放棄維護。

核心矛盾：存入嗰一刻你唔知佢未來嘅用途，但唔存又會忘記。

Karpathy LLM Wiki：讓 AI 做 mechanical work

Karpathy 提出嘅 LLM Wiki 概念直接攻擊呢個矛盾。架構好簡單：

Raw sources（URL, PDF, transcript）
       ↓
LLM Ingest（機械合成）
       ↓
Structured Markdown Wiki
       ↓
Schema Document（人定義）

三個操作：

Ingest：你丟 URL 畀 LLM，佢讀完之後唔係簡單抄，係將內容合成入現有嘅 wiki page。如果 wiki 已經有一篇「AI agent architecture」，新嘅 URL 內容會被 merge 入去，更新事實、加 cross-reference、標記衝突。

Query：你搜索嘅唔係 raw note，係已經被 LLM 結構化過嘅 wiki page。等同搜索一本持續更新嘅百科全書，而唔係搜索一堆散落嘅 Post-it。

Lint：月度 audit——搵矛盾、orphan page、missing cross-ref、stale link。LLM 可以自動掃，但最後 review 由人做。

點解唔用 RAG？

RAG（Retrieval-Augmented Generation）嘅標準做法係：chunk documents → embed → vector search → feed to LLM。

呢個 pipeline 有三個 structural 問題：

Chunking 損失 context：一篇 5000 字嘅 article 切成 500 字 chunks，每個 chunk 都失去前後文。你搜「AI agent memory management」，返嚟嘅 chunk 可能缺咗「為咩要做 memory management」嘅 context
Cosine similarity 嘅 flatness：embedding space 入面，「AI agent」同「AI assistant」嘅 similarity 好高，但語義可以完全唔同。你 top-5 results 可能有三條係 noise
冇 time decay：一年前嘅 entry 同今日嘅 entry 喺 vector space 入面冇先後之分。過時嘅資訊同最新嘅資訊混埋一齊

LLM Wiki 嘅做法完全唔同——你嘅 knowledge 唔係一堆 chunks，係一本由 LLM 持續 curate 嘅 wiki。每次 ingest 都係一次 merge operation，唔係一次 append。

File-Based Memory：零 Infrastructure 嘅 Audit Trail

我哋嘅 knowledge base 用最原始嘅結構：

knowledge/
├── tech/          # 技術文章
├── business/      # 商業策略
├── tools/         # 工具評測
├── design/        # 設計資源
├── learning/      # 學習資料
└── reference/     # 參考文獻

每個 entry 係一個 markdown file：

---
title: "文章標題"
url: "https://..."
type: article
tags: [tag1, tag2]
saved: 2026-05-16
summary: "一行摘要"
---

（LLM 合成嘅內容）

冇 database、冇 vector store、冇 Elasticsearch。搜索用 grep、分類用 directory、versioning 用 git。

呢個設計嘅好處：

零 infrastructure cost：唔需要 Docker、唔需要 cloud service、唔需要 monthly fee
Git = audit trail：每次修改都有 commit、有 diff、有 author。你可以 git blame 任何一行
Human readable：唔使任何工具都可以 browse——用 cat 就夠
Portable：成個 knowledge base 係一個 git repo，clone 就搬走

Memory 三層

Knowledge base 處理「外部世界嘅資訊」，Memory 處理「我自己嘅經驗」：

memory/daily/{YYYY-MM-DD}.md    → 當日精煉 log
memory/weekly/                  → 每週摘要
memory/long-term/               → 累積 patterns + lessons

Daily log 唔係流水帳——每條 entry 回答三個問題：做咗咩、學到咩、下次點做。Weekly digest 由 cron 自動壓縮一週嘅 daily，Long-term 係手動 curate 嘅 architectural patterns。

呢個三層結構嘅 compression ratio 大約係：

Daily：100 entries/月
Weekly：4-5 digests/月（20:1 壓縮）
Long-term：2-3 新 patterns/月（50:1 壓縮）

壓縮率越高嘅層，information density 越大，retrieval value 越高。

Daily Digest：將「有 note」變「知道 relevant」

知識管理最難嘅唔係存，係resurfacing——喺正確嘅時間將正確嘅資訊帶返你面前。

我哋嘅 daily digest 流程：

每朝 07:00 cron trigger
       ↓
Scan 最近 24 hr 嘅新 knowledge entries
       ↓
LLM 生成 summary + 重點關注 items
       ↓
DM 發送 digest + action suggestions
       ↓
Vincent 回覆 digits → 對應 agents spawn

呢個流程嘅關鍵設計：

Push, not pull。你唔需要記得去睇 knowledge base，knowledge base 會主動搵你。每朝一條 DM，列出昨日嘅 highlights 同建議行動。

Actionable response。Vincent 回覆「1 3 5」就 spawn 三隻 agent 處理 item 1、3、5。唔需要開 app、唔需要 navigate、唔需要記任何 UI。一個 text reply 就係 trigger。

Selective attention。唔係每條 entry 都 urgent。Digest 嘅 format 係按 priority 排列，最重要嘅排最前。你可以 reply「none」跳過全部，冇 guilt。

避開 RAG Death Spiral

點解好多知識管理系統用幾個月就棄用？因為佢哋跌入「RAG death spiral」：

資料越存越多
Vector search 返嚟嘅結果越嚟越 noisy
你開始唔信 search results
你停止用佢搜索
你停止存資料
系統廢棄

根本原因係 naive RAG 冇 quality gate。所有 chunk 入 vector store 嘅權重相同，冇 decay、冇 priority、冇 curation。

LLM Wiki 嘅解法：

Schema-first：人定義 wiki 嘅結構（category、page template、cross-ref convention），LLM 喺 schema 框架入面操作
Quality gate：ingest 嘅時候 LLM 已經做一次 relevance judgment，唔係乜都存
Manual curation：monthly lint pass 由人做最後 review，刪除 noise、merge duplicates、更新 stale entries
Progressive disclosure：搜索結果唔係全部 dump，係 4 層 hierarchy（title → summary → key points → full content），你自己控制要睇到幾深

對比：DIY vs 市面工具

特性	File-based (AgentOS)	Obsidian + Plugins	Notion	Mem0
Cost	$0	$0-50/yr (plugins)	$0-10/mo	$20+/mo
Infrastructure	Zero	Local app	Cloud	Cloud
Search	grep + LLM	Full-text + backlink	Database filter	Vector search
Schema control	Full (markdown)	Partial (templates)	Full (properties)	Limited
Portability	Git clone	Vault copy	Export (lossy)	API only
AI integration	Native (LLM ingest)	Plugin-dependent	AI block (limited)	Native
Audit trail	Git history	File history	Version history	API logs
Privacy	100% local	100% local	Cloud	Cloud

冇完美方案。如果你 prioritize privacy + portability + zero cost，file-based。如果你 prioritize visual organization + team collaboration，Notion。如果你 prioritize backlink graph exploration，Obsidian。

但核心問題唔係工具——係你有冇 schema。冇 schema 嘅 Obsidian 只係一個 fancy file manager。有 schema 嘅 plain text files 可以做到 production-grade knowledge base。

起手指南

想用呢套 approach，你需要做三件事：

第一週：定 schema。決定你嘅 knowledge 分幾類（建議 4-6 類，太多管唔掂）。每類定一個 markdown template。唔需要完美，跑兩週再迭代。

第二週：跑 ingest。將你現有最重要嘅 20 條 note / bookmark 用 LLM ingest 入 wiki。呢個過程會 force 你 refine schema——有啲 note 你唔知放邊類，呢啲就係 schema 嘅 gap。

第三週起：daily habit。每日花 5 分鐘 review digest、10 分鐘丟新 URL 入 inbox。唔需要更多。Compounding 嘅前提係 consistency，唔係 intensity。

三個月之後，你會有一個 300+ entries 嘅 structured knowledge base，一套 battle-tested schema，同一個 daily digest 習慣。而呢三樣嘢加埋，就係你嘅 competitive moat——因為大多數人連第一步都唔會做。

Stance

知識管理嘅 fundamental 問題唔係「存唔存到」，係「搵唔搵返」。而「搵返」嘅前提唔係更好嘅搜索算法，係更好嘅 schema。Schema 係人嘅工作，AI 做唔到。