I built this after reading too many incident reports of agent loops spending $200 in 4 minutes because a quality threshold was never met.
The pattern is always the same: an agent retries, fans out, or loops. Each iteration passes individual rate-limit checks. Observability fires an alert after the money is gone. Provider caps are per-provider, not cross-provider. None of these stop the spend before it happens.
RunCycles takes a different approach: reserve budget before the call, commit actual spend after, release the remainder if the work is cancelled. The reservation is atomic across all affected budget scopes — tenant, workspace, agent — using Redis Lua scripts so concurrent agents sharing a budget can't collectively overrun it.
The integration surface is small:
@cycles(estimate=50_000, action_kind="llm.completion", action_name="gpt-4o")
def call_llm(prompt: str) -> str:
return openai.complete(prompt)
When budget is exhausted, the next reservation attempt gets a 409
BUDGET_EXCEEDED before the downstream call is made.The architecture is three pieces:
- Cycles Protocol: an open OpenAPI spec defining the reservation lifecycle, idempotency semantics, scope hierarchy, and overage policies. - RunCycles Server: Spring Boot + Redis, implements the spec. Runs in Docker. - Clients: Python, TypeScript, Java/Spring Boot.
The hardest part was idempotency under retries — if a commit fails transiently and retries with the same key, it should get the original response back, not double-charge. The Lua scripts handle this atomically.
What it's not: a billing system, observability dashboard, or agent framework. It's the layer that decides whether an action may proceed before it proceeds.
Org: https://github.com/runcycles Docs: https://runcycles.github.io/docs
Comments URL: https://news.ycombinator.com/item?id=47382742
Points: 2
# Comments: 0