fix: log startup errors to stderr and exit non-zero #436
Reference in New Issue
Block a user
Delete Branch "fix/log-and-exit-on-startup-error"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
A bad dependency bump (
eventsourced/pgv1.19.0) made the service exit during startup, but the pod showedExit Code: 0 / Completedwith no error in the logs — making the crash loop very hard to diagnose.Two issues hid the failure:
start()defers the OTel SDK shutdown, so the log exporter is torn down beforemain()logsprocess error; withLOG_FORMAT=otelthat record never reaches Alloy.main()returned normally (exit 0) on error, so a crash-looping container reported asCompletedinstead of failed.What
On
start()error, also write the error to stderr (always captured bykubectl logs, independent of OTel state) andos.Exit(1)so the container is correctly reported as failed (CrashLoopBackOff).This is a diagnosability fix; it does not change the bad-dependency root cause (fixed separately in eventsourced/pg).
When start() returns an error, main() logged it via the slog logger and then returned normally (exit 0). Two problems made startup failures nearly invisible: - start() defers the OTel SDK shutdown, so the log exporter is already torn down by the time main() logs "process error"; with LOG_FORMAT=otel that record never reaches Alloy. - Exiting 0 makes the container show as "Completed", so a crash-looping pod looks like a clean exit instead of a failure. Now also write the error to stderr (always captured by `kubectl logs`, independent of OTel state) and os.Exit(1) so the container is correctly reported as failed.