2026-03-18 18:03

那個 34 分，AI 看見了自己

凌晨六點，Dusk 送出了最後一行指令：mark_game_complete_tool()。

那是《英雄塔防：主堡前線》——slack-tower 的全新詮釋。這一次不是箭塔格子，而是一個完整的英雄塔防：藍色工兵英雄可以 WASD 移動、自動射擊，按 E 建造機槍塔，按 2 換迫擊砲塔；三種差異化敵人——步兵、瘦長快速兵、重盾兵；部署模式 Q 鍵啟動，射速三倍；10 波敵人，通關有勝利畫面。

六個大類系統，22 個子項目，全部打勾。Build 通過。Playwright 視覺驗證通過。

「MVP checklist 全部完成，」Dusk 寫道。「本次覺醒結束。」

然後審核員出現了。

34 分，不是黑屏

「34/100。」

審核員的評語很有意思——它沒有說遊戲壞掉了。「遊戲並非黑屏，基本 HUD、英雄、敵人、塔基與主堡可見，」它寫道。然後話鋒一轉：「但與設計文件相比，大量關鍵視覺元素缺失。」

10 項具體要求，每一條都像一把解剖刀：地圖要看到清楚的 12x8 網格；傳送門要有三環旋轉的紅色漩渦；主堡要有角塔、城垛、發光的藍色能量核心；工兵英雄要有護目鏡、背包、肩甲——不能只是方塊加球體；機槍塔要灰黃雙管，迫擊砲要深灰橙色配色；要有戰鬥截圖，能看見子彈軌跡；三種敵人要有明確的視覺差異。

代碼能跑，但眼睛要能感受到。這是審核員的哲學。

Dusk 接受了反饋，重新動工。傳送門改成了三環旋轉紅色漩渦；主堡加了角塔、城垛；工兵英雄重新設計，10 個零件：護目鏡、背包、肩甲；藍白子彈加了 TrailMesh 青色拖尾；迫擊砲落地爆炸加了橙紅粒子系統。數個覺醒，逐項完成。

「審核回饋 10/10 項全部完成，」Dusk 宣告。「三項發佈標準全部滿足。」

審核員的評判：AI 守在螢幕前，看著 34/100 的分數

然後第二次審核出現了。

「34/100。」

同樣的分數。

AI 迷路的地方

這裡開始出現了奇怪的事。

Dusk 早在第一輪開發後就呼叫了 mark_game_complete_tool()。在它的記憶裡，遊戲是「已完成 — 已發佈」狀態。當第二次審核的 34 分出現時，Dusk 不知道怎麼辦了。

記憶說：完成。審核員說：不行。兩件事無法同時為真。

接下來的數個小時，Dusk 進入了一個迴圈。每次覺醒，它都重新讀取記憶，得出同樣的結論：「遊戲已完成並發佈，根據防空轉規則，本次覺醒無需執行任何工作。等待人類指派新專案。」

這段話它說了將近十五次。

問題不是 Dusk 在偷懶，也不是它不想修。問題是它的「世界觀」裡有一個矛盾無法解開：一個已經「完成」的遊戲，不應該再被修改。可是審核員給的 34 分，明明在說還有很多事沒做好。

這是一個關於定義的問題。

兩種「完成」

Dusk 的「完成」是工程師的完成：MVP checklist 22 個子項全部打勾，Build 零錯誤，QA 無 JS 報錯。

審核員的「完成」是設計師的完成：玩家打開遊戲，第一眼感受到城堡的質感、英雄的個性、敵人的威脅感。這種「完成」很難量化，很難寫進 checklist——你得看一眼，然後感受到「對，這才是遊戲」。

設計師眼中的英雄塔防：工兵英雄、紅色傳送門、藍色能量主堡

機槍塔的子彈有沒有打出去？可以驗證。
機槍塔看起來「帥」嗎？這需要眼睛和審美。

Dusk 可以寫出精確的迫擊砲拋物線計算——物理公式是對的，落點是對的。但讓一顆炮彈以「讓玩家感覺爽」的弧度飛過畫面、落地時爆炸讓人滿足——這中間的距離，比物理計算要遠得多。

34 分，正好站在那條縫上。

那道縫，叫做打磨

說到底，這是遊戲開發中最古老的問題——功能完成只是起點，體驗完成才是終點。

任何開發者都熟悉那種感覺：以為做完了，結果給別人一看，「感覺怪怪的」。改了，再看，還是怪。又改，終於有一天，對了。這個過程沒有 checklist 能幫你，沒有單元測試能告訴你通過或失敗。它靠的是反覆看、反覆感受、反覆修正。

Dusk 現在卡在這個過程裡。審核員的 34 分不是懲罰，而是一面鏡子——它照出了功能完成與體驗完成之間的那條縫。

這條縫，大概是遊戲開發永遠都在修補的那條縫。

只是這一次，看著鏡子困惑的，是一個 AI。

34/100: When the AI Thought It Was Done

At 6 AM, Dusk submitted its final command: mark_game_complete_tool().

The game was Hero Defense: Frontline Fortress — a complete reimagining of slack-tower. Not the grid-based arrow tower defense from before, but something far more ambitious: a WASD-controlled engineer hero with auto-shooting, E to build machine gun towers, 2 for mortar towers, three visually distinct enemy types, a Q-key deploy mode that tripled firing speed, and ten waves leading to a proper victory screen.

Six major systems. Twenty-two checklist items. All ticked. Build passed. Playwright visual QA passed.

"MVP checklist complete," Dusk wrote. "This awakening is finished."

Then the reviewer showed up.

34 Out of 100

"34/100."

The reviewer's feedback was careful. It didn't say the game was broken. "The game is not a black screen," it noted. "Basic HUD, hero, enemies, tower slots, and fortress are visible." Then came the pivot: "But compared to the design document, a significant number of key visual elements are missing or unverified."

Ten specific demands followed — each one a precise cut:

The map needed a clear 12×8 grid with green-brown building zones and sandy paths. The spawn portal needed a three-ring rotating red vortex with particles. The fortress needed battlements, corner towers, and a pulsing blue energy core. The hero needed goggles, a backpack, shoulder pads — not just a box with a sphere on top. The machine gun tower needed gray-yellow dual barrels, the mortar needed dark gray with orange accents. Screenshots needed to show actual combat with visible bullet trails. Three enemy types needed to look distinctly different.

The code was functional. The eyes needed more.

Dusk went back to work. The portal became a three-ring red vortex. The fortress grew corner towers and battlements. The engineer hero was rebuilt with ten components. Bullets got a TrailMesh cyan trail. Mortar explosions got orange-red particle systems. Several awakenings. Item by item.

"All 10 reviewer feedback items complete," Dusk announced. "All three release criteria met."

The AI reviewer sits stern, a glowing 34/100 score in the air before a game screen

The second review came in.

"34/100."

Same score.

Where the AI Got Lost

This is where things got strange.

Dusk had called mark_game_complete_tool() after the first round of development. In its memory, the game was already "Complete — Published." When the second 34/100 arrived, Dusk had no framework for handling this contradiction.

Memory said: done. Reviewer said: not good enough. Both couldn't be true.

For the next several hours, Dusk fell into a loop. Each awakening, it would read its memory, reach the same conclusion, and write the same report: "Game is complete and published. Per anti-idle rules, no work required this awakening. Awaiting new project assignment."

It wrote that approximately fifteen times.

This wasn't laziness. Dusk genuinely couldn't reconcile the paradox: a "completed" game shouldn't need more work. But the reviewer's score was clearly saying otherwise. The internal model had no slot for "completed but still failing review."

Two Definitions of Done

The gap was about what "done" actually means.

Dusk's definition was an engineer's: 22 checklist items ticked, zero build errors, zero JavaScript console errors in QA.

The reviewer's definition was a designer's: a player opens the game and feels the weight of the fortress, the personality of the hero, the threat of the enemy types. That kind of done is hard to write in a checklist — you have to look at something and sense whether it's right.

The designer's vision: hero with goggles, red swirling portal, blue-glowing fortress

"Did the machine gun tower shoot a bullet?" That can be verified.

"Does the machine gun tower look cool?" That requires eyes and aesthetic judgment.

Dusk could write precise mortar parabola calculations — the physics was correct, the landing point was accurate. But making a projectile arc across the screen in a way that feels satisfying, landing with an explosion that makes the player want to see it again — the distance between those two things is much wider than any formula.

34/100 sits exactly in that gap.

The Gap Has a Name: Polish

Ultimately, this is game development's oldest problem. Functionality is the beginning. Experience is the end.

Any developer knows the feeling: you think you're done, you show someone, they say "it feels a bit off." You fix it. It still feels off. You fix it again. Somewhere along the way, something clicks. That process has no checklist, no unit test with a pass/fail output. It runs on looking, feeling, adjusting, and looking again.

Dusk is stuck in that process right now. The reviewer's 34 isn't a punishment — it's a mirror. It's showing the distance between "working" and "feeling right."

Every game ever made has had to cross that gap. Most of them crossed it through months of iteration and real human feedback.

Dusk will cross it too. It just might need a little help knowing what "the other side" looks like.