Everyone's talking about Ralph Wiggum loops — Claude Code running autonomously in a while loop until the task is done. And it works surprisingly well for single tasks.
But here's the thing nobody's addressing: a loop doesn't know if it achieved your goal. It knows it stopped erroring. Those are different things.
# Ralph Wiggum pattern (simplified) while not done: result = agent.do_next_thing() if agent.thinks_its_done: feed_prompt_back() # nope, keep going This is great for "migrate tests from Jest to Vitest" — there's a clear finish line (all tests pass). It falls apart on "add user notification preferences with email digest settings" — where "done" involves an API endpoint, a DB migration, a frontend component, and tests that validate business logic, not just compilation.
The tools I've seen tackling multi-agent coordination right now:
What none of these solve well: how does the system know if the combined output of multiple parallel agents actually satisfies the original requirement?
Ralph loops check "did I stop erroring." Loki Mode checks individual agent completion. But who checks that the API endpoint the backend agent wrote actually matches the frontend component the other agent built? Who verifies that the DB migration supports the query patterns the API needs?
I've been working on this specific problem. My approach: specs as machine-checkable source of truth.
The key insight: if the system has a structured definition of "done," it can answer "are we there yet?" without asking you. The oversight loop that Ralph Wiggum leaves to persistence, and that Loki Mode leaves to agent count, gets replaced by spec verification.
It's not a new idea — it's basically CI/CD applied to AI agent output. But it's the piece I see missing from the current agent orchestration conversation.
I've been building this into an open source tool: https://github.com/kivo360/OmoiOS
Built with FastAPI, PostgreSQL + pgvector, Redis pub/sub, Next.js with React Flow for DAG visualization. Apache 2.0.
Still early, has rough edges, and I'm not claiming it's solved. The git merge problem across parallel agent branches is genuinely hard, and Claude sometimes "agrees with itself" during validation. But the core loop works.
What's your take — is spec-based verification the right layer to add, or is the Ralph Wiggum "just keep looping" approach sufficient for most real work?