flatreader

Everyone's talking about Ralph Wiggum loops — Claude Code running autonomously in a while loop until the task is done. And it works surprisingly well for single tasks.

But here's the thing nobody's addressing: a loop doesn't know if it achieved your goal. It knows it stopped erroring. Those are different things.

# Ralph Wiggum pattern (simplified) while not done: result = agent.do_next_thing() if agent.thinks_its_done: feed_prompt_back() # nope, keep going

This is great for "migrate tests from Jest to Vitest" — there's a clear finish line (all tests pass). It falls apart on "add user notification preferences with email digest settings" — where "done" involves an API endpoint, a DB migration, a frontend component, and tests that validate business logic, not just compilation.

The tools I've seen tackling multi-agent coordination right now:

Ralph Wiggum — single-agent persistent loop. Simple, effective, no coordination.
OpenClaw — agent framework with session management. Great ecosystem, but you're still the orchestrator deciding what runs when.
Dorothy — session dashboard and meta-agent. Solves the "which session was doing what?" problem.
Loki Mode — 37 specialized agents. Ambitious, but the r/ClaudeAI consensus was that it overclaims and needs proof.
LangGraph — DAG primitives for agent workflows. Powerful but low-level.

What none of these solve well: how does the system know if the combined output of multiple parallel agents actually satisfies the original requirement?

Ralph loops check "did I stop erroring." Loki Mode checks individual agent completion. But who checks that the API endpoint the backend agent wrote actually matches the frontend component the other agent built? Who verifies that the DB migration supports the query patterns the API needs?

I've been working on this specific problem. My approach: specs as machine-checkable source of truth.

Before any agent runs, structure the feature into requirements with acceptance criteria
Decompose into a task dependency graph — what can parallelize, what's blocked
Each task runs in an isolated sandbox (Daytona container, own git branch)
After each task: a separate validation step checks output against the acceptance criteria. Not "did it compile" but "did it satisfy the requirement"
Fail → retry with failure context. Pass → merge.

The key insight: if the system has a structured definition of "done," it can answer "are we there yet?" without asking you. The oversight loop that Ralph Wiggum leaves to persistence, and that Loki Mode leaves to agent count, gets replaced by spec verification.

It's not a new idea — it's basically CI/CD applied to AI agent output. But it's the piece I see missing from the current agent orchestration conversation.

I've been building this into an open source tool: https://github.com/kivo360/OmoiOS

Built with FastAPI, PostgreSQL + pgvector, Redis pub/sub, Next.js with React Flow for DAG visualization. Apache 2.0.

Still early, has rough edges, and I'm not claiming it's solved. The git merge problem across parallel agent branches is genuinely hard, and Claude sometimes "agrees with itself" during validation. But the core loop works.

What's your take — is spec-based verification the right layer to add, or is the Ralph Wiggum "just keep looping" approach sufficient for most real work?

Ralph Wiggum loops don't know if they achieved your goal. They just know they stopped erroring.