Foreman

Foreman: why I built a distributed job queue from scratch

The motivation and stack choices behind building a self-scaling distributed job queue from scratch.

Why three binaries instead of one, Postgres as source of truth, and the transactional outbox seam between Postgres and Kafka.

Why you can’t atomically write to Postgres and Kafka, and how the outbox pattern with idempotent consumers gets you exactly-once semantics.

Building a bounded worker pool with priority Kafka dispatch and clean SIGTERM shutdown — no work abandoned, no offsets lost.

Structured logs, OpenTelemetry traces across Kafka, Prometheus metrics, and the four Grafana panels that actually matter.

What’s over-engineered, what I cut corners on, and the one thing I’d change if I started over.