Four Weeks from Prompt to Parallel Agents

Four weeks ago I typed a prompt into Claude.ai and got back a project structure. Folder layout, Docker Compose config, a FastAPI skeleton, a Next.js shell. I didn't write any of it by hand. I read it, understood it well enough to say yes, and started building from there.

For the next two weeks I worked on FairwayPlan in the gaps between everything else with a few hours on a weeknight, a Sunday afternoon. Using Claude Code inside the terminal, one feature at a time, one conversation at a time. I'd describe what I wanted, review the output, push back on anything that felt wrong, and move on. It was still sequential. Still one thing at a time. But the pace was different from building alone.

Today I did something I hadn't tried before: I ran three agents in parallel, each on its own feature branch, and a fourth agent to merge everything together. That's what this post is about. It's not just the workflow, but what the four-week progression to get here actually looked like.

What I wanted to build

Three things had been sitting on my list for FairwayPlan:

A share page — every generated itinerary gets a public URL with an 8-character code, a Leaflet map of the route, and Open Graph metadata so it previews properly when shared on social.
A homepage demo — a sample itinerary rendered on the homepage so visitors can see what the output looks like before they commit to filling in the 5-step wizard.
A test suite — proper Playwright E2E tests covering the wizard flow, plus pytest integration tests hitting the backend directly.

None of these features touched each other. The share page was frontend + a backend query for fetching an itinerary by code. The demo was a static fixture and a new API endpoint. The tests were purely additive. Three entirely parallel workstreams.

The setup: three agents, three branches, three folders

I gave each feature to a separate Claude Code agent, each working in its own copy of the repository on its own branch:

fairwayplan-share → feature/share-page
fairwayplan-demo → feature/demo-itinerary
fairwayplan-tests → feature/test-suite

Each agent got a brief describing exactly what to build: the components, the API shape, the file locations, any constraints that mattered. Then I let them run.

While all three were working, I wasn't waiting. I was reviewing one agent's output, answering a clarifying question from another, and occasionally watching the git log tick forward on all three branches at the same time. It felt strange at first, almost like trying to follow three conversations simultaneously. But each agent had a narrow enough scope that the context-switching cost was low. I was a reviewer, not a builder.

What each agent actually produced

Share page agent

The share page agent built the whole thing end-to-end: the Next.js server component at /itinerary/[code] that fetches itinerary data and emits Open Graph tags, a dynamic OG image route using next/og that generates a 1200×630 card per itinerary, a SharePage.tsx component with a hero section, day cards colour-coded by course type, and a CTA. It also added ShareMap.tsx— a Leaflet map loaded as a dynamic client component with ssr: false, so the map bundle stays out of the main wizard chunk. The agent correctly flagged that Leaflet needed to be added to package.json and that the lock file needed to be committed.

Demo itinerary agent

The demo agent built a /api/demo/itinerary endpoint in FastAPI that returns a fixed fixture without touching the solver, registered it in backend/main.py, and wired up a SampleItinerary.tsx component on the homepage that pulls from it. The fixture data lives in frontend/src/lib/demoFixture.ts — a deliberate choice to keep it out of the backend and make it easy to update without a full redeploy.

Test suite agent

The test agent added Playwright specs for both new features demo-itinerary.spec.tsand share-page.spec.ts alongside the existing wizard scenario tests. It also added a pytest suite in backend/tests/ with integration tests that hit the running Docker stack at localhost:8000, and arequirements-test.txt so the dependencies are explicit.

The fourth agent: integration

When all three features were done, I brought in a fourth agent with a different brief. Not "build something", but to "merge everything cleanly."

The integration agent's job was to take the three feature branches and land them onmain in order, resolving any conflicts. I gave it an explicit conflict resolution policy before it started:

Accumulate, don't replace. If two branches both add a target to the same Makefile .PHONY line, keep both. Never pick one side over the other when the additions are compatible.
Preserve all new files. If a branch adds a file, it stays.
Flag true contradictions. If two branches make genuinely incompatible changes to the same line — not additive, actually contradictory — leave a comment and flag it for human review.

The first merge — feature/share-page into main — was clean. No conflicts. The second merge — feature/demo-itinerary — hit three conflict hunks, all in the Makefile. The .PHONY line, the deploy section comments, and the deploy target itself: deploy-share vsdeploy-demo. Each branch had added its own target to the same location. The integration agent resolved all three by keeping both — exactly what the policy required. The third merge — feature/test-suite — had one conflict in.PHONY: the test branch added tests-backend tests-backend-fast tests-all to the line that main had already extended withdeploy-share deploy-demo. Same resolution: keep everything.

No flagged contradictions. Every conflict was additive.

What this felt like compared to doing it sequentially

If I'd built these three features one after another, each one would have taken a focused session. The share page alone, with the OG image route, the Leaflet setup, the backend query changes, is a few hours of careful work. The test suite even longer, because writing good tests requires understanding the full application surface. Sequentially, this is a week of evenings.

The parallel approach compressed the calendar time significantly. But the more interesting change was cognitive. Each agent had a clear scope and stayed inside it. The share page agent didn't need to know about tests. The test agent didn't need to know about the demo feature. When I reviewed each one I was reading a focused, bounded piece of work and not trying to hold the whole system in my head while also making decisions about component structure.

The bottleneck stopped being "how fast can I write code" and became "how clearly can I specify what I want." That's a trade I'll take every time.

Where this breaks down

It's worth being honest about the constraints. This approach works when the features are genuinely independent. If the share page had needed to modify the same database schema that the demo feature was also changing, the conflict resolution would have been a real problem, not a mechanical .PHONY line merge.

The integration step also requires clear policy before it starts. An agent resolving conflicts without explicit rules will make guesses. Sometimes those guesses are fine. Sometimes they quietly drop one branch's work to resolve a conflict, and you won't notice until much later. Writing the conflict resolution policy first, accumulate, preserve, flag, is what made the integration reliable rather than just fast.

And agents working in isolation can't see each other's work. If the share page agent had made a design decision that the test agent needed to account for, the test agent wouldn't know. For this project, the features were independent enough that this didn't matter. For a more tightly coupled set of changes, you'd need explicit handoffs between agents, or a shared spec document they all read before starting.

I'm still new to this and others are doing it at a different scale

To put this in perspective: four weeks ago I didn't have a project. Three weeks ago I was figuring out how to get Docker Compose to resolve service names correctly. Two weeks ago I was learning how asyncpg handles JSONB columns and why you can't just call json.dumps() on top of a codec that already handles serialisation. Today I ran parallel agents.

That's a fast ramp, but it's still a short ramp. Three agents on a toy project is a modest starting point. I want to be honest about where this sits relative to what other people are actually doing.

Zach Wills ran 20 parallel agents for a week and shipped a production analytics platform: roughly 800 commits, 100+ PRs, auth, background jobs, CI/CD, and ~800 unit tests — all in one week, at a cost of around $6,000 in credits. He describes entering a "multitasking flow state" where you're doing high-level situational awareness across 20 parallel streams simultaneously, and finding it mentally exhausting after about three hours. His eight rules post went viral on Hacker News. I've read it twice. I'm not there yet.

The engineering team at incident.io runs 4–7 concurrent agents via git worktrees as a standard part of their engineering workflow. They measured a JavaScript editor feature taking 10 minutes of agent time against an estimated 2 hours of human work, and an API build tooling improvement delivering an 18% performance gain for $8 in usage. These aren't experiments, instead they're how the team ships. Rory Bain described it as "a distributed team of junior developers, each working to my guidance."

At the extreme end: Anthropic published a case study where 16 parallel Claude Code agents — coordinated by a lead agent — wrote a 100,000-line Rust C compiler capable of compiling the Linux kernel across x86, ARM, and RISC-V targets. That project ran for nearly 2,000 Claude Code sessions and cost around $20,000. It's a research exercise more than a workflow template, but it sets a ceiling for what the coordination model can actually hold together.

Elian at DoltHub wrote about running 3–4 agents for correctness testing work, hitting a practical memory ceiling at 32GB RAM when trying to push to 6. His setup uses Docker isolation per agent to prevent file and port conflicts — something I haven't needed at 3 agents but will probably have to think about if I try to scale up.

Addy Osmani framed the shift well: the developer role moves from "conductor" (pointing each instrument at each task) to "orchestrator" (setting the strategy, handing off execution, validating the output). At 3 agents on a toy project, I'm closer to conductor. The orchestrator role — where the review backlog, not the agent count, is the real bottleneck — seems to kick in somewhere around 5–7 agents, based on what I've read.

The community consensus seems to be that 3–4 agents is the natural starting point for an individual developer, and review capacity is what limits you before compute does.

What I got right, even at this scale: isolated branches, explicit conflict resolution policy, a dedicated integration agent with a different brief from the feature agents, and not letting agents touch the same files. The primitives are the same whether you're running 3 agents or 20. I'm just using fewer of them, on a smaller problem, with less at stake if something goes wrong.

I'm hoping to get there. The gap between 3 agents and 20 isn't really a technical gap — it's a workflow discipline gap. You have to get good at writing specs, at killing unproductive agents quickly, at knowing when to step in and when to let them run. That's learnable. And if the last four weeks are any indication, it's clear that a single prompt to project scaffold, to daily Claude Code sessions, to parallel agents with an integration orchestrator the pace of that learning is faster than I expected.

The resulting state of main

After the integration: a share page with a working map and OG metadata, a homepage demo pulling from a live API endpoint, Playwright tests covering both features, pytest integration tests for the backend, and a Makefile with targets for all of it. Three features shipped in parallel, merged cleanly, with a full test suite covering the new surface area.

The part that still surprises me: the integration agent produced a detailed summary of every conflict it resolved and how, without being asked. Useful for the commit history, useful for me to verify nothing was lost.

I'm not sure "agentic workflow" is the right framing for this. It's closer to running a small team where everyone has a very specific job and you've written down the rules before anyone starts. The rules are the hard part. The agents are just fast.