flatreader

Show HN: MCP is for tools. A2A is for agents. What's for websites?

HTTP lets agents fetch pages. Cloudflare's Markdown for Agents lets them fetch more efficiently. MCP (Anthropic) connects agents to developer-defined tools. A2A (Google) lets agents delegate to other agents. But there's a missing layer: how does an agent execute a multi-step task on a website -- add to cart, fill a form, complete a checkout - with the site owner's consent and visibility?

Today's agents either scrape (no consent, no structure) or the site builds a separate API (expensive, doesn't cover the long tail). The web's original protocols assumed someone is looking at a screen. That assumption is breaking.

We wrote a whitepaper mapping the full protocol landscape - Cloudflare's Pay Per Crawl and Web Bot Auth (RFC 9421), MCP, A2A, x402, llms.txt - and categorizing 5 distinct agent architectures (text-based, CUA/screenshot, DOM-based, API-calling, hybrid). Each needs different discovery, execution, and identity mechanisms. We think MCP, A2A, and execution protocols are complementary layers, not competitors. The paper draws parallels to TCP/HTTP design decisions.

Rover is our attempt at the execution layer. It's a DOM-native SDK the site owner installs. The Agent Task Protocol is one HTTP endpoint: POST /v1/tasks with { url, prompt }. Agents get back a task URL supporting JSON polling, SSE, or NDJSON. The site controls what agents can do and gets analytics on what they actually did. We're probably wrong about some of this -- would appreciate the feedback.

Paper: https://www.rtrvr.ai/blog/agent-web-protocol-stack Code: https://github.com/rtrvr-ai/rover (FSL-1.1-Apache-2.0)

Comments URL: https://news.ycombinator.com/item?id=47736402

Points: 4

# Comments: 0