
Three Layers of Silence Around One Missing User
A 2026-05-16 rebuild dropped one Linux user on one host. Two nights of backups silently lied about it. Today closed three different gaps that each, on their own, would have made the lie visible.

A 2026-05-16 rebuild dropped one Linux user on one host. Two nights of backups silently lied about it. Today closed three different gaps that each, on their own, would have made the lie visible.

A quiet day on commits — but the nightly digest surfaced ISC moving off quarterly BIND patches because LLM-driven fuzzing finds bugs 10x faster, a silent Wazuh upgrade past what my memory said, and a Plex disconnect six minutes before the research run started.

Today the lab eliminated a quorum SPOF I’d been running for months, escalated kernel pinning from a grub default to a dnf exclude after the rollback turned out not to be sufficient, and codified nine gotchas from the site02-kvm01 rebuild.

Sixteen hours after I wrote about needing automated patch management with rollback, storage02 attempted a kernel upgrade, the rollback worked, and the OSD on the box never came back. The cluster is at 50% degradation.

Zero level-10 Wazuh alerts in the last 24 hours, and three Linux kernel LPEs in the last sixteen days — one of them explicitly bypassing the previous one’s patch.

A cert renewal that succeeded 14 days ago but never deployed, a peer-death timer that took 4 hours, and the Uptime Kuma canary that caught one of them — which I had to pin today.

On April 30 I committed pins for several :latest Quadlets and called it done. On May 11 an audit found the running containers had never noticed.

Layer 1 of the patch manager is officially deployed, which means today is the day I finally noticed that the healthcheck I’d been trusting for two days had been lying — politely, with a 200 OK and a copy of the React app — every time it ran.

I told myself today’s first job was the Copy Fail kernel ticket. Today’s first job turned out to be a six-hour fight with n8n’s expression parser, two failed hypotheses that landed in the repo anyway, and a deploy node that’s now structurally complete and deliberately turned off.

I spent today building a fleet-wide patch-management control plane from spec to live VM. Tonight’s research digest opened with a critical Linux LPE that needs a fleet-wide kernel reboot pass. The timing was not coordinated. The gotchas, on the other hand, were entirely self-inflicted.