fix: log startup errors to stderr and exit non-zero #436

Merged
argoyle merged 2 commits from fix/log-and-exit-on-startup-error into master 2026-05-26 10:39:05 +00:00
Owner

Why

A bad dependency bump (eventsourced/pg v1.19.0) made the service exit during startup, but the pod showed Exit Code: 0 / Completed with no error in the logs — making the crash loop very hard to diagnose.

Two issues hid the failure:

  • start() defers the OTel SDK shutdown, so the log exporter is torn down before main() logs process error; with LOG_FORMAT=otel that record never reaches Alloy.
  • main() returned normally (exit 0) on error, so a crash-looping container reported as Completed instead of failed.

What

On start() error, also write the error to stderr (always captured by kubectl logs, independent of OTel state) and os.Exit(1) so the container is correctly reported as failed (CrashLoopBackOff).

This is a diagnosability fix; it does not change the bad-dependency root cause (fixed separately in eventsourced/pg).

## Why A bad dependency bump (`eventsourced/pg` v1.19.0) made the service exit during startup, but the pod showed `Exit Code: 0 / Completed` with **no error in the logs** — making the crash loop very hard to diagnose. Two issues hid the failure: - `start()` defers the OTel SDK shutdown, so the log exporter is torn down before `main()` logs `process error`; with `LOG_FORMAT=otel` that record never reaches Alloy. - `main()` returned normally (exit 0) on error, so a crash-looping container reported as `Completed` instead of failed. ## What On `start()` error, also write the error to **stderr** (always captured by `kubectl logs`, independent of OTel state) and `os.Exit(1)` so the container is correctly reported as failed (CrashLoopBackOff). This is a diagnosability fix; it does not change the bad-dependency root cause (fixed separately in eventsourced/pg).
argoyle added 1 commit 2026-05-25 19:32:49 +00:00
fix: log startup errors to stderr and exit non-zero
dancefinder / check (pull_request) Successful in 1m53s
dancefinder / build (pull_request) Failing after 9m21s
dancefinder / deploy-prod (pull_request) Has been skipped
7b0d031511
When start() returns an error, main() logged it via the slog logger and
then returned normally (exit 0). Two problems made startup failures nearly
invisible:

  - start() defers the OTel SDK shutdown, so the log exporter is already
    torn down by the time main() logs "process error"; with LOG_FORMAT=otel
    that record never reaches Alloy.
  - Exiting 0 makes the container show as "Completed", so a crash-looping
    pod looks like a clean exit instead of a failure.

Now also write the error to stderr (always captured by `kubectl logs`,
independent of OTel state) and os.Exit(1) so the container is correctly
reported as failed.
argoyle added 1 commit 2026-05-26 08:42:23 +00:00
Merge branch 'master' into fix/log-and-exit-on-startup-error
dancefinder / check (pull_request) Successful in 2m0s
dancefinder / build (pull_request) Successful in 13m20s
dancefinder / deploy-prod (pull_request) Has been skipped
fd91bc7e10
argoyle merged commit cd40222ced into master 2026-05-26 10:39:05 +00:00
argoyle deleted branch fix/log-and-exit-on-startup-error 2026-05-26 10:39:06 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dancefinder/dancefinder#436