Building browser infrastructure for AI agents

Most browser automation breaks the moment a site deploys Cloudflare or checks your TLS fingerprint. We built Reduck to fix that.

The problem with existing tools

Selenium, Playwright, Puppeteer — they all share the same fundamental flaw. They launch a new browser instance with a clean profile, no cookies, no history, and a dozen detectable signals that scream “bot.”

Sites have gotten good at catching this. Cloudflare, Akamai, PerimeterX — the arms race between bot detection and headless browsers has been going on for years. Every time a framework patches one signal, detection vendors find three more.

The result: your automation works in dev, fails in prod, and you spend more time fighting detection than building features.

Our approach: use the real browser

The Reduck team

Instead of fighting detection, we sidestepped it entirely. Reduck runs inside your actual Chrome browser through a lightweight extension. From the site’s perspective, there’s nothing to detect — it’s a real human browser with real cookies, real history, and a real TLS fingerprint.

This changes everything:

  • No detection — sites see a genuine Chrome browser
  • Authenticated sessions — you’re already logged in
  • Zero setup — no proxy configuration, no profile management
  • Multi-tab orchestration — the extension manages tabs natively

The architecture

The system has three layers:

1. The Chrome Extension

A thin execution layer that receives commands via web push and translates them into browser actions. It handles tab management, navigation, form filling, clicking, and data extraction.

The extension is intentionally minimal. It doesn’t make decisions — it executes steps. The intelligence lives elsewhere.

2. The Runner

A Python service that receives natural-language prompts, breaks them into browser actions using an LLM, and streams those actions to the extension. The runner is where the “AI” part happens.

It handles:

  • Prompt → action plan conversion
  • Error recovery (element not found? try a different selector)
  • Layout adaptation (site redesigned? the LLM figures it out)
  • Result extraction and formatting

3. The SDK / CLI

The developer-facing interface. Send a prompt, get structured data back. The SDK handles WebSocket connections, event streaming, and error handling so you don’t have to.

import { ReduckClient } from "@reduck-ai/sdk"

const client = new ReduckClient({ apiKey: "rk_..." })
const [device] = await client.listDevices()
const run = client.run("Download my latest invoice from Stripe", { deviceId: device.id })
for await (const event of run) {
	if (event.type === "progress") console.log(event.text)
	if (event.type === "done") console.log(event.success ? "Done" : "Failed")
}

What we learned building this

Sessions are harder than they look

Managing authenticated sessions across multiple automations is surprisingly complex. You need to handle:

  • Session expiry and re-authentication
  • Multi-factor auth prompts that appear mid-flow
  • Sites that rotate CSRF tokens on every request
  • Cookie consent banners that block the entire page

Our solution: let the human handle auth once, then ride that session. The extension preserves whatever state the browser has — cookies, localStorage, IndexedDB — without needing to understand any of it.

LLMs are bad at pixel coordinates

Early versions tried to use vision models to click exact coordinates. This was slow, expensive, and brittle. A button that moves 10 pixels to the left in a redesign would break everything.

We switched to a semantic approach: the LLM identifies what to interact with (the “Submit” button, the email input field), and the extension resolves the actual DOM element. This is both faster and more resilient to layout changes.

Streaming matters

Nobody wants to wait 30 seconds in silence for a result. We stream events as they happen — navigation started, form filled, button clicked, data extracted. This lets developers build responsive UIs and debug issues in real-time.

{"type": "progress", "step": "navigating", "url": "https://dashboard.stripe.com"}
{"type": "progress", "step": "clicking", "target": "Invoices tab"}
{"type": "progress", "step": "extracting", "data": "Invoice #2024-0142"}
{"type": "done", "result": {"invoice_id": "2024-0142", "amount": "$299.00"}}

What’s next

We’re working on three things:

  1. MCP server — so Claude, ChatGPT, and any MCP-compatible agent can use Reduck natively
  2. Standalone mode — run automations without needing an external agent
  3. Multi-device orchestration — distribute work across multiple browser sessions

If you’re building AI agents that need to interact with the web, we’d love to hear from you. Join us on Discord or check out the docs.

The best browser automation is the one that sites can’t distinguish from a human.

Share this post