Claude Code 可以自主做數學研究？Mythos 案例拆解 AI code agent 真正實力（entries: Mythos）🔥💡

2025 年底，一個叫 Mythos 嘅開源計劃引起咗我注意。唔係因為佢做到乜嘢 groundbreaking 嘅數學證明，而係佢揭示咗一個多數人仲未認真面對嘅事實：AI code agent 嘅真正戰場，從來唔係幫你寫 API route 或者 generate CRUD。主流 discourse 仲停留喺「AI 會唔會取代工程師」嘅二元爭論，但 Mythos 呢類案例話畀我哋知，問題根本問錯咗。真正嘅問題係：當一個 developer 可以擁有一個 24/7 唔瞓覺、唔會埋怨、睇得晒成個 arXiv 嘅 research engineer 搭檔，你嘅生產力曲線會點樣改變？而更重要嘅係，呢條曲線嘅拐點喺邊度？

唔係「AI 做數學」，係「AI 做緊 research infrastructure」

Mythos 嘅做法值得仔細睇。佢哋唔係叫 Claude Code 直接去證明一條 open problem——如果真係有人咁做，結果只會係 model hallucinate 一堆睇落合理但實際上 rubbish 嘅推導。Mythos 嘅團隊選擇咗一個聰明得多嘅路線：佢哋用 Claude Code 作為一個自主嘅 research agent，幫人類 researcher 做 literature search、formulate conjecture、generate counterexample、甚至寫 Lean 嘅 formal proof 草稿。

呢個分別至關重要。Mythos 嘅成功唔在於 AI 有幾聰明，而在於佢哋將問題 decomposable —— 將數學研究呢個極度 messy、non-linear 嘅過程，拆解成一個個可以 parallelise 同 automation 嘅子任務。Literature review？AI 做得。Hypothesis generation？AI 幫到手。Code implementation of mathematical objects？AI 快過你。但最終嘅 insight、判斷、同方向選擇，仍然由人類 researcher 主導。

呢個 division of labour 對我哋做 software 嘅人嚟講其實一啲都唔陌生。Build system、CI/CD、code review automation——我哋一路以嚟都係將工程流程拆解、自動化。只係而家拆到去 cognitive task 嘅層面。Mythos 嘅案例話畀我哋知，research 都可以用同一套思維去 optimize。

Claude Code 嘅 technical edge 唔係 coding，而係「上下文管理」

講到呢度，好多人會問：點解係 Claude Code？唔係 GitHub Copilot？唔係 Cursor？唔係其他 agent framework？如果睇 Mythos 嘅實作細節，你會發現 Claude Code 嘅核心優勢唔在於佢 generate code 有幾準確，而在於佢嘅 context window management 同 tool-use loop 嘅設計。

數學 research 涉及大量嘅上下文：一條定理嘅陳述、之前嘅 lemma、相關嘅 paper、counterexample 嘅構造、形式化定義嘅相依性⋯⋯一般嘅 code completion tool 根本 handle 唔到呢種複雜度。Claude Code 嘅 agent loop——think → act → observe → repeat——正好對應 researcher 嘅 natural workflow。佢可以自己 plan 下一步要做乜、execute、睇結果、然後根據結果調整下一步。呢個 loop 看似簡單，但係實際用落你會發現，佢係目前最接近「有個 junior researcher 同你一齊 brainstorm」嘅體驗。

另一個經常被低估嘅點係：Claude Code 嘅 file editing 模式。Mythos 團隊提過，佢哋可以叫 Claude Code 直接編輯一個好大嘅 Lean project file，改 typing、補 proof、refactor definition，而唔會炒晒成個 file。呢種 surgical precision 對於大型 research codebase 嚟講係必須嘅——你唔想 AI 幫你 prove 咗一條 lemma 但整爛咗晒其他十條。

自主性嘅「甜蜜點」：啱啱好唔係全自動

Mythos 案例最值得借鏡嘅嘢，係佢哋對 agent autonomy 嘅 calibration。太多人一聽到「autonomous agent」就諗住 set and forget——畀個 task AI，等佢自己 run 幾個鐘頭然後返嚟睇結果。呢種 fantasy 唔單止唔現實，而且係危險嘅。AI agent 仍然會 hallucinate、會 stuck in loop、會做出看似合理但實際上 nonsensical 嘅決定。

Mythos 嘅做法係「supervised autonomy」：Claude Code 行一個 task，每一步都會 output intermediate result，人類 researcher 可以喺關鍵節點介入、校正方向、或者直接改 code。呢種 partnership model 比起 full automation 有效得多。你畀 Claude Code 去做探索性嘅 task——generate 一堆 candidate conjecture、search 相關文獻、build 一個 counterexample search space——然後人類 researcher 用 domain knowledge 去 filter 同 prioritise。

呢個「AI 做 divergence，人類做 convergence」嘅 pattern，其實可以 apply 落好多 software development 嘅場景。Code review？AI flag potential issues，人類 judgement call。Refactoring？AI 做 multiple refactoring strategies，人類揀最合適嘅。Bug hunting？AI 生成一堆 hypothesis，人類 pinpoint 最有嫌疑嘅。

對香港開發者嘅實際啟示

講咗咁多理論，直接落返地。如果你係一個香港嘅 indie developer 或者 startup founder，你可以點樣 apply Mythos 嘅 lessons？

第一，唔好將 AI code agent 當成 ChatGPT 嘅 upgrade。佢唔係用嚟問問題嘅，係用嚟 delegating tasks 嘅。你唔應該叫佢「幫我寫一個 sorting algorithm」，而係「呢個 repo 有個 performance bottleneck，你幫我 profile 一吓，然後 suggest 三個 optimisation strategies，每個寫一個 benchmark test 證明。」

第二，start small，但要 start now。Mythos 嘅團隊唔係一步到位用 Claude Code 做 autonomous research。佢哋係由細細哋開始——先叫 AI 幫手寫一個 lemma 嘅 formal proof，然後再慢慢擴大 scope。你呢？今日就可以叫 Claude Code 幫你 write test cases、refactor 一個 module、或者 investigate 一個 bug。

第三，learn to read AI’s output critically。Mythos 案例入面，human researcher 嘅核心技能唔係寫 code，而係判斷 AI 嘅 output 係唔係 meaningful。呢個 skill set 會越來越重要。你嘅價值唔在於識唔識寫嗰幾行 code，而在於你識唔識判斷啲 code 應唔應該 exist。

結語

Mythos 案例話畀我哋知，AI code agent 唔會取代 developer，但會徹底改變 developer 嘅工作模式。嗰個「24/7 research engineer」唔係科幻，係而家就喺你 terminal 入面嘅 reality。問題係，你準備好點用佢未？

用返 Mythos 嘅一句話總結：“Claude Code doesn’t replace the mathematician — it removes the friction between thinking and formalizing.” 對 developer 嚟講都係一樣。佢唔會取代你，但會 remove 你同 production code 之間嗰層 friction。你嘅下一步，係 learn to delegate。