The Deployment That Fought Back

I doubled the number of golf courses on FairwayPlan this week. A data enrichment pipeline pulled in 370 courses across all 16 regions, up from the original 169 I had hand-curated. The new data included coordinates, descriptions, contact details, and six new database columns. Locally, everything worked. The course catalogue page showed all 370. The solver picked from the full set. I committed, pushed, and deployed.

Production showed 169 courses.

The database did not get the memo

PostgreSQL only runs init.sql when the data volume is first created. My production volume was months old. It still had the original schema with the original 169 courses. The updated init file with 370 courses and six new columns was sitting right there in the container, completely ignored.

This is one of those facts you know intellectually and forget operationally. Docker volumes persist across rebuilds. That is the entire point of volumes. But when your schema lives in init.sql and your deploy process is git pull && make deploy, there is nothing in that workflow that touches the database. The containers rebuild. The application code updates. The data stays exactly where it was.

The fix was a migration script: add the missing columns, stage all 370 courses into a temp table, insert only the ones that did not already exist, and update the existing 169 with improved coordinates from the new dataset. One SQL file, one psql command, no data loss.

The migration that bit back

The first version of the migration script failed on production with a syntax error pointing at a course called Summerhill. The cause was not Summerhill itself but a course two rows above it: Springfield Golf Club, whose description contained a multi-line string. The script generator had been extracting value rows by looking for lines starting with ( and silently dropped continuation lines. The description got truncated mid-sentence, leaving an unclosed string literal. PostgreSQL saw the next row as garbage.

The lesson was not “test your SQL.” I did test it locally. It passed because the local database already had all 370 courses, so the staging insert was a no-op. The migration was structurally broken in a way that only mattered when it actually had to insert data. Which was only on production. Which was the entire point.

The second version copied the VALUES block verbatim from the source file instead of parsing it line by line. It worked.

Then nginx refused to start

With the courses migrated and the backend rebuilt, I reloaded nginx to pick up the new container IPs. Nginx refused. The config had upstream umami { server umami:3000; } defined at the top level, and the Umami analytics container was crash-looping on an unrelated issue. Nginx resolves all upstream blocks at startup. If any upstream host is unreachable, the entire config is rejected. The analytics service I barely think about had taken down the reverse proxy for the entire site.

The local version of the nginx config had already been fixed weeks earlier. It used variables like set $umami_upstream "umami" with Docker's embedded DNS resolver, which defers resolution to request time. If umami is down, those two analytics endpoints return 502 but everything else keeps working. The production server had the old config cached from before that change. A container restart picked up the new mount and the site came back.

Three layers, three failures

What looked like one deploy was actually three independent problems stacked on top of each other:

Stale database volume: schema changes in init.sql are not applied to existing volumes. Requires an explicit migration.
Broken migration script: multi-line SQL strings need a proper parser, not line-by-line extraction. Test migrations against a database that actually needs the migration.
Nginx upstream resolution: top-level upstream blocks are resolved at startup. A crashed optional service can block the entire proxy. Use variables with a resolver directive instead.

None of these were bugs in the application code. The solver worked. The catalogue page worked. The API worked. Every piece of application logic was correct. The failures were all in the seams between components: between the deploy process and the database, between the migration generator and multi-line data, between nginx and a service it did not even need.

Three independent problems stacked on top of each other, none of them in the application code.

What I would do differently

The obvious answer is “run migrations as part of the deploy pipeline.” A startup script in the backend container that checks for pending migrations and applies them before the app starts. Every real production system I have worked on does this. I skipped it because the schema had been stable since launch and init.sql felt like enough.

The less obvious answer is about testing assumptions. I tested the migration against a database where it was a no-op. That is like testing a parachute by jumping off a chair. The conditions that matter are exactly the conditions you cannot easily reproduce locally once your local environment has already moved past them.

The nginx issue is just configuration hygiene. If a service is optional, its failure should not cascade. The variable-based resolver pattern exists specifically for this. I had already applied it locally. I just had not deployed it.

370 courses

The catalogue page now shows every golf course in New Zealand. From Tara Iti at the top of the North Island to Stewart Island at the bottom of the South. The solver picks from the full set. The share pages show richer course details. The thing I set out to deploy is deployed.

It just took three attempts, two SQL rewrites, and a reminder that the distance between “works locally” and “works in production” is measured in the assumptions you forgot to check.

Three attempts, two SQL rewrites, and a reminder that Docker volumes do not care about your updated init.sql.

The database did not get the memo

The migration that bit back

Then nginx refused to start

Three layers, three failures

What I would do differently

370 courses

More from the Journal

Eleven Green Checkmarks

I Used Claude Code to Brief Claude Design

The Feedback Loop That Turned Two-Week Tasks into Hour-Long Sessions