diff options
| author | St33v <github@f3rr3t.com> | 2026-06-09 17:19:24 +1000 |
|---|---|---|
| committer | St33v <github@f3rr3t.com> | 2026-06-09 17:19:24 +1000 |
| commit | 5eff385770b3a7ddcfb9fcf4349be28013251a75 (patch) | |
| tree | 569af08e518fd79a3ba0f6a70231880938b93c36 | |
| parent | 258a3bec5192106e1e24e65763649b7f6943af9e (diff) | |
Memoir: BOM radar rollout session
Captures the spec → deployed-and-looping arc for IDR713 and IDR663,
the three real bugs encountered (silent IM single-frame PNG, ffmpeg
palette mismatch, git remote user mix-up), and a temp-file race
side-quest worth hardening later.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| -rw-r--r-- | doc/bom-radar-rollout.md | 150 |
1 files changed, 150 insertions, 0 deletions
diff --git a/doc/bom-radar-rollout.md b/doc/bom-radar-rollout.md new file mode 100644 index 0000000..5c0a8a2 --- /dev/null +++ b/doc/bom-radar-rollout.md @@ -0,0 +1,150 @@ +--- +title: "BOM radar rollout — Sydney and Brisbane on cremonde" +date-created: "2026-06-09" +type: "memoir" +status: "complete" +author: + - "st33v" + - "claude-opus-4-7" +model: + - "claude-opus-4-7" +tldr: "Took the bomsynoptic side-spec for IDR713 from a draft to a live, looping APNG on radar.pestrel.com in one session — and added Brisbane (IDR663) at the end. Most of the work was wiring, not coding; the spec turned out to be the easy part." +chat-url: "https://claude.ai/chat/..." +session-kind: "infrastructure" +side-quests: 1 +reader-targets: + - "st33v" + - "claude-code" + - "scrivener" +related: + entities: + - "cremonde" + - "pestrel.com" + concepts: + - "static layers + dynamic layers" + - "lower plate / upper plate" + songs: [] +tags: + - "bom" + - "radar" + - "deployment" + - "ffmpeg" + - "systemd" +provenance: + - "doc/bom-radar-spec.md (v0.1, drafted same day)" +--- + +# BOM radar rollout — Sydney and Brisbane on cremonde + +## TL;DR + +A spec drafted earlier on 2026-06-09 (`doc/bom-radar-spec.md`) called for adding a 6-minute rain-radar loop to the existing `bomsynoptic` deployment, beginning with Sydney (IDR713) and parameterising on product code so other radars could follow. By end of session both Sydney and Brisbane were live at `radar.pestrel.com/sydney/` and `/brisbane/`, served as proper animated APNGs from a 6-minute systemd timer. The spec was correct in every architectural choice; the friction was entirely in the plumbing — git remote shuffling, file ownership, ImageMagick lying about APNG support, and a palette-mismatch in ffmpeg's apng encoder. + +## Context + +`bomsynoptic` was already deployed on cremonde: a shell script on a 6-**hourly** systemd timer fetching the BOM MSLP synoptic PDF, rasterising it to PNG, and serving the latest as a single `<img>` on `pestrel.com`. Spec proposed a second product class — rain radar — on a 6-**minute** cadence, layered (transparencies + echo frames), with a rolling buffer of the last six frames as a loopable animation. + +The spec opened with four §12 questions to the implementing agent. Those were worked through against the existing code first, then implementation proceeded. + +## Spec triage — the §12 answers + +In order: + +1. **Scheduler reuse:** no shared in-code scheduler exists — each product is its own systemd timer. The radar pattern mirrors synoptic's: `radar.{service,timer}` + `radar-retry.{service,timer}` with `OnFailure` chaining, just at `OnUnitActiveSec=6min` instead of `OnCalendar=*:10:00`. +2. **Asset convention:** existing pattern is `/srv/www/pestrel/synopticLatest.png` served by nginx from `/srv/www/pestrel`. Decided radar gets its own subdomain (`radar.pestrel.com`), own web root (`/srv/www/radar/`), own working dir (`/var/lib/radar/<id>/`), own `/opt/radar/` for scripts. Mirroring the synoptic shape, but isolated. +3. **Pillow vs ImageMagick:** existing stack is pure shell + curl + magick + gs. Staying in shell+IM keeps the footprint identical. Recommended this; st33v confirmed. +4. **Loop format:** existing front-end is one `<img>` tag, no JS. APNG drops in unchanged; sequence+manifest would have meant introducing JS. Picked APNG. + +The §10 copyright concern (BOM FTP is personal-use, not commercial-republish) was flagged and dismissed: *"No one is looking at the site right now."* Attribution to BOM is implicit by content; no formal gating. + +## Implementation — first pass + +Wrote in roughly this order: + +- `radarFetch.sh`: parameterised on `RADAR_ID` (defaulting `IDR713`); refresh transparencies on 24h TTL with `.last_refreshed` marker; build two cached plates (lower = background + topography + optional feature overlays; upper = range + locations); fetch top-N echo frames by lexical sort; composite each as `lower → echo → upper → legend@SE`; evict frames outside the rolling buffer by set membership. +- Four systemd units (`radar.service`, `radar.timer`, `radar-retry.service`, `radar-retry.timer`). +- `nginx/radar.pestrel.com.conf` with `Cache-Control: no-cache` on `.apng`. +- A black-page `radar.index.html` with the single `<img>`. +- Extended `setup.sh` to provision radar dirs/units/scripts/index. Extended `deploy.sh` to take a `synoptic|radar` argument. + +DNS: st33v added an `A` record for `radar` → 139.162.32.70 (cremonde) alongside the apex and `www`. + +## Git remote shuffle + +Existing remote was github (`f3rr3t/bomSynoptic`). User wanted cremonde as origin. First attempt set the URL to `cremonde:/home/git/bomSynoptic.git` — but cremonde's ssh alias logs in as `st33v`, and the bare repo is owned by user `git`. Hit two consecutive failures: + +1. **"dubious ownership"** — git's anti-tampering check. Added `safe.directory` exception. +2. **"unable to create temporary object directory"** — st33v can't write into git-owned dirs. Real fix: change the URL to `git@cremonde:/home/git/bomSynoptic.git`, matching how the other ~30 bare repos in `/home/git/*.git` are reached. *"You can — I just set the remote wrong."* Lesson logged: "use ssh cremonde" can mean two things, host alias and user; check the existing convention rather than guess. + +Reverted the safe.directory entry as no longer needed. + +## Deployment — three real bugs + +The deploy was bumpy and instructive. In order encountered: + +### Bug 1 — empty deployment + +After provisioning everything, ran setup.sh on cremonde and was told *"setup.sh did not install the radar units."* The clone on cremonde was at `c132b39` — predating all the radar work, because the radar files had never been committed or pushed. Stupid but easy fix: stage, commit (`f058e83`), push, pull on cremonde, re-run setup. + +### Bug 2 — ImageMagick silently writing single-frame PNGs + +First successful service run produced `/srv/www/radar/idr713-loop.apng` at 35,106 bytes — identical to `frame.00.png`. `file` confirmed: plain PNG, not animated. ImageMagick on Arch is built against an upstream libpng that lacks the APNG patch — so `magick -delay 50 -loop 0 frame*.png out.apng` happily produces a single-frame PNG with no warning. Switched to `ffmpeg`'s `concat` demuxer + `apng` muxer (also unlocking variable per-frame durations cleanly via the demuxer text format). Added `ffmpeg` to setup.sh's pacman line. st33v's reaction was just "but ffmpeg is useful" — fair, took the install. + +### Bug 3 — ffmpeg palette mismatch + +ffmpeg succeeded silently in test but exited 255 in production with: + +> `Input contains more than one unique palette. APNG does not support multiple palettes.` + +The composited frames were 8-bit palettized PNGs, and each frame's palette differed slightly. ffmpeg's apng encoder won't accept that. Single-character fix: `-vf format=rgba` in the ffmpeg invocation. The file went from a stuck 35 KB single-frame PNG to a proper 91 KB 7-frame RGBA APNG (six frames + the trailing duplicate the concat demuxer requires). + +Important meta-observation: between bug 2 and bug 3, the user noticed *"seems stuck on the first image"*. Without that, the failed `systemctl` retries would have just kept hammering the BOM FTP every two minutes and the published file would have remained stuck at 13:05. The error was in the journal but nobody was watching the journal. + +## Brisbane + +Once Sydney was confirmed working, st33v provided the Brisbane code (IDR663, Mt Stapylton) and asked the design question: select between, or show both? Considered three shapes (subdirs, subdomains, stacked on one page), recommended subdirs (`/sydney/`, `/brisbane/`) with cross-links — minimal new infra, no JS, matches the existing one-img-per-page aesthetic. st33v picked that. + +Implementation: changed `radar.service` to have two `ExecStart` lines (one per `RADAR_ID` arg), prefixed both with `-` so one radar's outage doesn't skip the other. Added `radar.sydney.html` and `radar.brisbane.html` (corner cross-link nav) and rewrote `radar.index.html` as a centred two-button landing. setup.sh provisions the subdirs and installs all three pages. + +Deployed clean. Both APNGs publishing on the same 6-minute timer. + +## Resolutions + +- **Subdomain not subpath.** radar.pestrel.com instead of pestrel.com/radar — clean nginx separation, independent evolution. +- **APNG over GIF, despite GIF requiring no new deps.** st33v's "ffmpeg is useful" weighed against the GIF size penalty. +- **Sequence+manifest dropped.** Would have meant the first JS in the project. APNG keeps the black-page-one-img idiom intact. +- **Per-radar invocation of one script over a template service.** `ExecStart=` × N with `-` prefix is simpler than `radar@.service` template units for the current scale of two. +- **TLS deferred.** Plain HTTP. To revisit when adding certbot to cover both pestrel.com and radar.pestrel.com. + +## Operational learnings + +- **Arch ImageMagick has no APNG writer.** It does not warn; it just writes a single-frame PNG. If APNG matters, use ffmpeg or apngasm. +- **ffmpeg's apng muxer rejects palette mismatches.** Composite chains that go through palette-PNG intermediates need `-vf format=rgba` to be safe. +- **`git remote = git@host:path` for cremonde bare repos.** Not `host:path`, which logs in as st33v and can't write the git-owned object dirs. +- **systemd `ExecStart=-...` prefix** ignores the exit code of that one line — useful when a service runs several independent jobs and you want partial-success semantics. +- **The `concat` demuxer wants the last file repeated** without a `duration` line, otherwise the final frame's duration is ignored. +- **`Cache-Control: no-cache` on `.apng`** matters when the file updates every 6 minutes and the URL stays stable. + +## Cheat sheet — adding a third radar + +1. Confirm the IDR code (lookup BOM's radar codes page). +2. Add `ExecStart=-/opt/radar/radarFetch.sh IDRxxx` to `systemd/radar.service` and `radar-retry.service`. +3. Write `radar.<city>.html` (copy of an existing one, swap the apng filename and the cross-link target). +4. Add `install -d /srv/www/radar/<city>` and the html install line to `setup.sh`. +5. Update `radar.index.html` to add a third button (and rebalance the gap). +6. Commit; push; pull on cremonde; `sudo ./setup.sh`. +7. `sudo systemctl start radar.service` to publish immediately rather than waiting up to 6 min for the next timer fire. + +Everything else — directories, naming, compositing — is identical across radars. + +## Appendix A — side-quest: temp-file race hardening + +*Surfaced when a manual `/opt/radar/radarFetch.sh` invocation collided with a service-triggered run at 14:05:31. Both wrote to the same `loop.apng.tmp`; one won, the other lost with `install: cannot stat …`.* + +In production only the timer should invoke the script, so this race is unlikely under normal operation. But the failure mode is silent corruption rather than a clean fail, which makes it worth hardening: + +- **Option a:** `mktemp` per invocation. `TMP_APNG=$(mktemp "${OUT_DIR}/loop.XXXXXX.apng")`, then `trap 'rm -f "$TMP_APNG"' EXIT`. Minimal change; each invocation has its own temp. +- **Option b:** `flock` over the whole script. `exec 9>/var/lock/radar-${RADAR_ID,,}.lock; flock -n 9 || exit 0`. Cleaner — prevents concurrent runs entirely, including the spec's stated "skip a poll cleanly if the previous one is still running" requirement. + +Option b is closer to what the spec asked for. Worth doing. + |
