Commit Graph

10 Commits

Author SHA1 Message Date
argoyle 1549538c70 perf(graph): warm schema cache on startup to kill cold-start spikes
schemas / vulnerabilities (pull_request) Successful in 2m11s
schemas / check-release (pull_request) Successful in 3m8s
schemas / check (pull_request) Successful in 3m30s
pre-commit / pre-commit (pull_request) Successful in 7m13s
schemas / build (pull_request) Successful in 6m30s
schemas / deploy-prod (pull_request) Has been skipped
Following the schema cache PR, warm pods serve from cache (~24/25 hits
on a long-running pod). New pods, however, start cold: the first
LatestSchema query per (orgId, ref) still runs the wgc router compose
subprocess, which costs 100-300m CPU per call.

That cold-start cost is what kept tripping the HPA into TooManyReplicas:
HPA scales up → new pod added → new pod runs wgc on first query →
metrics spike → HPA scales up further → cycle repeats. Even after the
caching PR landed, observed pods cycling 2→4→2→4 in production, with
fresh pods showing 2 'Fetching latest schema' (cold) entries and 0
cache hits within their first minute.

Add Cache.AllOrgRefs() exposing every tracked (orgId, ref) pair, and
Resolver.WarmCache(ctx) which iterates them after the event-sourced
caches have been populated. For each ref it fetches the subgraphs,
runs sdlmerge, runs CosmoGenerator.Generate, and stores both results
in the cache. Errors per ref are logged and skipped so a single bad
ref does not block warming the rest.

Service startup calls WarmCache right after the Resolver is wired,
before the HTTP server starts accepting traffic, so the first
LatestSchema query a pod receives is already a cache hit.
2026-05-21 17:10:30 +02:00
argoyle 28aa32ad8c fix: prevent OOM on rapid schema publishing
schemas / vulnerabilities (pull_request) Successful in 6m43s
schemas / check-release (pull_request) Successful in 11m27s
schemas / check (pull_request) Successful in 14m51s
pre-commit / pre-commit (pull_request) Successful in 19m39s
schemas / build (pull_request) Successful in 8m26s
schemas / deploy-prod (pull_request) Has been skipped
Add concurrency-limited CosmoGenerator (semaphore limit=1, 60s timeout)
to prevent unbounded concurrent wgc process spawning. Add debouncer
(500ms) to coalesce rapid schema updates per org+ref. Fix double
subgraph fetch in Supergraph resolver and goroutine leak in
SchemaUpdates subscription.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 08:05:47 +01:00
argoyle 73eae98929 feat: migrate from GitLab CI to Gitea Actions
schemas / vulnerabilities (pull_request) Successful in 2m15s
schemas / check-release (pull_request) Successful in 2m17s
schemas / check (pull_request) Successful in 4m48s
pre-commit / pre-commit (pull_request) Successful in 5m58s
schemas / build (pull_request) Successful in 3m36s
schemas / deploy-prod (pull_request) Has been skipped
- Update git remote to git.unbound.se
- Add Gitea workflows: ci.yaml, pre-commit.yaml, release.yaml, goreleaser.yaml
- Delete .gitlab-ci.yml
- Update Go module path to gitea.unbound.se/unboundsoftware/schemas
- Update all imports to new module path
- Update Docker registry to oci.unbound.se
- Update .goreleaser.yml for Gitea releases with internal cluster URL
- Remove GitLab CI linter from pre-commit config
- Use shared release workflow with tag_only for versioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 22:53:46 +01:00
argoyle 80daed081d feat: add Cosmo Router config generation and PubSub support
Creates a new `GenerateCosmoRouterConfig` function to build and 
serialize a Cosmo Router configuration from subgraphs. Implements 
PubSub mechanism for managing schema updates, allowing 
subscription to updates. Adds Subscription resolver and updates 
existing structures to accommodate new functionalities. This 
enhances the system's capabilities for dynamic updates and 
configuration management.
2025-11-19 11:29:30 +01:00
argoyle aaa111dd20 feat(service): implement graceful shutdown for HTTP server
Add a context with timeout to handle graceful shutdown of the HTTP 
server. Update error handling during the server's close to include 
context-aware shutdown. Ensure that the server properly logs only 
non-closed errors when listening.
2025-04-12 13:41:02 +02:00
argoyle 6cd3b935e4 feat: add full SDL to SupGraph response 2024-03-25 21:46:15 +01:00
argoyle a46895081e chore: actually validate API key privileges and refs 2023-05-29 22:13:25 +02:00
argoyle 554a6c252f feat: organizations and API keys 2023-04-27 07:46:52 +02:00
argoyle 98a679d2d3 chore: add context and error handling 2022-12-17 14:36:42 +01:00
argoyle a1b4d4fc27 feat: initial commit 2022-10-09 20:37:48 +02:00