原文地址:https://www.john-rush.com/posts/ai-20250701.html
Overview 概述
I keep several claude code windows open, each on its own git-worktree. o3 and sonnet 4 create plans, sonnet 3.7 or sonnet 4 execute the plan, and o3 checks the results against the original ask. Any issues found are fed back into the plan template and the code is regenerated.
我打开了几个 Claude 代码窗口,每个窗口都在各自的 git-worktree 上。o3 和 sonnet 4 创建计划,sonnet 3.7 或 sonnet 4 执行计划,o3 根据原始请求检查结果。发现的任何问题都会反馈到计划模板中,并重新生成代码。
The factory improves itself.
工厂不断自我完善。
Read on to see what might be useful for you.
请继续阅读,了解哪些内容可能对您有用。
Guiding Principle – Fix Inputs, Not Outputs 指导原则——修复输入,而不是输出
When something goes wrong, I don’t hand-patch the generated code. I don’t argue with claude. Instead, I adjust the plan, the prompts, or the agent mix so the next run is correct by construction.
当出现问题时,我不会手动修补生成的代码。我不会和 Claude 争论。相反,我会调整计划、提示符或代理组合,以确保下一次运行在构造上正确无误。
If you know Factorio you know it’s all about building a factory that can produce itself. If not, picture a top-down sandbox where conveyor belts and machines endlessly craft parts because the factory must grow.
如果你了解 《异星工厂》, 你就会知道它的核心在于建造一座能够自我生产的工厂。如果你不了解,那就想象一下一个自上而下的沙盒,传送带和机器无休止地生产零件,因为工厂必须不断增长。
Do the same thing with AI agents: build a factory of agents that can produce code, verify it, and improve themselves over time.
对人工智能代理做同样的事情:建立一个可以生成代码、验证代码并随着时间的推移自我改进的代理工厂。
Basic day to day workflow - building the factory 日常基本工作流程 - 建造工厂
My main interface is claude code. It’s my computer now. I also have a local mcp which runs Goose and o3. Goose only because I’ve already got it setup to use the models hosted in our Azure OpenAI subscription. Looking to improve this at some point, but it works for now.
我的主要界面是 Claude Code 。它现在就是我的电脑了。我还有一个本地 MCP,运行 Goose 和 o3。只运行 Goose,因为我已经设置好了,可以使用 Azure OpenAI 订阅中托管的模型。希望以后能改进一下,但目前为止还算可以。
Step 1: Planning 步骤 1:规划
I’ll give a high level task to claude code, which calls over to o3 to generate a plan. o3 is a good planner and can ask a bunch of good questions to clarify the job to be done. I then have it write out a <task>-plan.md
file with both my original ask and an implementation plan.
我会给 Claude Code 一个高级任务,它会调用 o3 来生成计划。o3 是个优秀的规划器,可以提出一系列好问题来明确要完成的工作。然后我会让它写出一个 <task>-plan.md
文件,其中包含我最初的请求和实施计划。
Step 2: Execution 第 2 步:执行
First, sonnet 4 reads the plan, verifies it, and turns it into a task list. Next claude code execute the plan, either with sonnet 3.7 or sonnet 4 depending on the complexity of the task. Because most of my day-to-day is in clojure I tend to use sonnet 4 to get the parens right.
首先,sonnet 4 读取计划,验证它,并将其转换为任务列表。接下来,Claude Code 执行该计划,根据任务的复杂程度,可以使用 sonnet 3.7 或 sonnet 4。由于我大部分日常工作都用 Clojure,所以我倾向于使用 sonnet 4 来确保括号的正确性。
One important instruction is to have claude write commits as it goes for each task step. This way either claude or I can revert to a previous state if something goes wrong.
一个重要的指示是让克劳德在每个任务步骤中都记录提交。这样,如果出现问题,克劳德和我都可以恢复到之前的状态。
Step 3: Verification → Feedback into Inputs 步骤3:验证→反馈到输入
Once the code is generated, I have sonnet 4 verify the code against the original plan. Then I have o3 verify the code against the original plan and original ask. o3 is uncompromising. Claude wants to please, so will keep unnecessary backwards compatibility code in place.
代码生成后,我会让 sonnet 4 根据原计划验证代码。然后,我会让 o3 根据原计划和原先的要求验证代码。o3 绝不妥协。Claude 希望取悦大家,所以会保留不必要的向后兼容代码。
o3 will call that out and ask for it to be removed. Claude also tends to add “lint ignore flags” to the code which o3 will also call out. Having both models verify the code catches issues and saves me back and forth with claude.
o3 会调出并要求移除它。Claude 还会在代码中添加“lint 忽略标志”,o3 也会调出。两个模型都验证代码可以发现问题,省去了我和 Claude 反复沟通的时间。
Any issue sonnet 4 or o3 finds gets baked back into the plan template, not fixed inline.
sonnet 4 或 o3 发现的任何问题都会被重新纳入计划模板中,而不是内联修复。
Git worktrees let me open concurrent claude code instances and build multiple features at once. I still merge manually, but I’m no longer babysitting a single agent.
Git 工作树允许我同时打开多个 Claude 代码实例,并同时构建多个功能。我仍然需要手动合并,但不再需要单独照看一个代理。
Why Inputs Trump Outputs 为什么输入比输出更重要
- Outputs are disposable; plans and prompts compound.
输出是一次性的;计划和提示是复合的。 - Debugging at the source scales across every future task.
源头调试可以扩展到每个未来的任务。 - It transforms agents from code printers into self-improving colleagues.
它将代理从代码打印机转变为自我完善的同事。
Example: an agent once wrote code that would load an entire CSV into memory. I made it switch to streaming and had the agent write instructions to the plan to always use streaming for CSVs.
例如:一位代理曾经编写了一段代码,将整个 CSV 文件加载到内存中。我将其切换到流式传输,并让代理在计划中写入指令,使其始终使用流式传输 CSV 文件。
Now, my plan checker flags any code that doesn’t use streaming for CSVs, and I don’t have to remember this in every PR review. The factory improves itself.
现在,我的计划检查器会标记所有未使用 CSV 流式传输的代码,这样我就不用每次 PR 评审都记住这一点了。工厂本身就自我完善了。
Scaling the factory 工厂规模扩张
I’ve started to encode more complex workflows, where I have specific agents (behind mcps) for building specific tasks.
我已经开始编码更复杂的工作流程,其中我有特定的代理(在 mcps 后面)来构建特定的任务。
One MCP will sweep all the clojure code generated and then apply our local style rules. These rules are part of the instructions for the original plan and agent but often the generated code will have style issues. Especially once claude gets in the lint/test/debug cycle.
一个 MCP 会扫描所有生成的 Clojure 代码,然后应用我们本地的样式规则。这些规则是原始计划和代理指令的一部分,但生成的代码通常会存在样式问题。尤其是在 Claude 进入 lint/test/debug 循环之后。
This focused agent means we have tighter behavior and can apply our style rules consistently.
这个专注的代理意味着我们有更严格的行为,并且可以始终如一地应用我们的风格规则。
I’ve started doing this for internal libraries as well. It’s good at looking at generated code and replacing things like retries and Thread/sleep
with our retry library.
我也开始在内部库中这样做了。它很擅长查看生成的代码,并用我们的重试库替换诸如重试和 Thread/sleep
类的功能。
I’m also building out a collection of these small agents. Each one can take a small specific task, and by composing them together I can build more complex workflows.
我还构建了一系列小型代理。每个代理都可以执行一项特定的小任务,通过将它们组合在一起,我可以构建更复杂的工作流程。
For example, I can take an api doc, and a set of internally defined business cases and have a composition of agents build integrations, tests, and documentation for the api. This is a powerful way to build out features and integrations without having to do all the work by hand.
例如,我可以拿一份 API 文档和一组内部定义的业务案例,让一组代理为该 API 构建集成、测试和文档。这是一种构建功能和集成的强大方法,无需手动完成所有工作。
You don’t get there in one big step. Here’s the secret sauce: iterate the inputs
你不可能一步到位。秘诀在于: 迭代输入
It’s essentially free to fire off a dozen attempts at a task - so I do. All agents run in parallel. When one fails, stalls, or lacks context, I feed that lesson into the next iteration. I resist the urge to fix outputs, instead I fix the inputs.
尝试十几次完成一项任务基本上是免费的——所以我就这么做了。所有代理并行运行。当一个代理失败、卡住或缺乏上下文时,我会将这个经验反馈到下一次迭代中。我抑制了修复输出的冲动,而是修复输入。
That loop is the factory: the code itself is disposable; the instructions and agents are the real asset.
该循环就是工厂:代码本身是一次性的;指令和代理才是真正的资产。
Next up 下一步
I’m working on a few things to improve the factory:
我正在做一些事情来改善工厂:
- Better overall coordination of the agents. I tend to kick things off manually, but I want to have a more automated way to manage the workflow and dependencies between agents.
更好地协调代理的整体工作。我倾向于手动启动流程,但我希望有一种更自动化的方式来管理代理之间的工作流程和依赖关系。 - Aligning our business docs with the agents. Changing the information we capture to be at a higher level of abstraction so that the agents can use it more effectively. This means moving away from low level implementation details and focusing on use cases.
使我们的业务文档与代理保持一致。将我们捕获的信息提升到更高的抽象层次,以便代理能够更有效地利用这些信息。这意味着我们要摒弃低级的实现细节,转而专注于用例。 - More complex workflows. I’ve been able to build some pretty complex workflows with the current setup, but I want to push it further. This means more agents, more coordination, and more complex interactions between them.
更复杂的工作流程。我已经能够使用当前设置构建一些相当复杂的工作流程,但我希望更进一步。这意味着需要更多代理、更多协调以及它们之间更复杂的交互。 - Maximize token usage across providers. I’m pretty limited by bedrock’s token limits especially for sonnet 4. Going to need to be able to switch between the claude max plan and bedrock w/out interruption.
最大化跨提供商的代币使用率。我受 Bedrock 代币限制的影响很大,尤其是对于 Sonnet 4。我需要能够在 Claude Max 计划和 Bedrock 计划之间无缝切换。
Wrapping Up 总结
That’s where my factory sits today: good enough to ship code while I refill my coffee, not yet good enough to bump me off the payroll. Constraints will shift, but the core principle remains: fix inputs, not outputs.
这就是我工厂如今的处境:好到可以在我去倒咖啡的时候交付代码,但还不足以把我从工资单上踢出去。限制条件会发生变化,但核心原则不变: 只关注输入,而不是输出 。