How AI agents browse the web (and why your site might be invisible to them)

Most sites are built for humans sitting in front of a browser. That's fine, except AI agents aren't humans, and they don't use browsers.

When an agent needs to interact with your product, it sends an HTTP request, reads the response, and tries to extract something useful. No JavaScript engine, no mouse clicks, no scroll events. If your content only exists after a React component mounts, the agent sees an empty shell. If your site is behind Cloudflare's bot challenge, the agent never gets past the waiting room.

How an agent actually 'sees' a website

The sequence is roughly this: HTTP GET to a URL, parse the HTML or JSON response, check for an llms.txt file at the root, and look for any documented API it can call. That's the whole pipeline.

llms.txt is a simple convention, similar in spirit to robots.txt, that gives agents a curated map of what your site contains and how to use it. If you haven't heard of it, that's part of the problem. Most sites don't have one. Agents fall back to guessing, and guessing usually fails.

The three ways it breaks

In our data, agent failures almost always fall into one of three categories.

Access problems come first. Your robots.txt blocks known AI crawlers. Cloudflare fingerprints the client and serves a challenge page instead of content. Some sites block ClaudeBot in robots.txt while simultaneously using the Claude API in their own backend. That's incoherent, and it's more common than you'd expect.

Parsing problems come second. The page loads fine, technically, but all the meaningful content is rendered client-side. A simple HTTP client gets back a <div id="root"></div> and nothing else. Navigation, pricing, documentation, none of it visible. 50.4% of the sites we've analyzed are in this state: discoverable, but useless to an agent.

Action problems come third. Even if an agent can read your site, there's nothing it can actually do. No API. No structured way to initiate a trial, submit a form, or retrieve account data. The agent can look, but not touch.

The spectrum from broken to useful

Think of it as levels. At L0, your site is actively blocking agents. At L1, it's accessible but the content is unreadable. L2 is readable but static, no way to act. L3 means you have an API, but it's not documented in a way agents can discover. L4 is a documented, agent-accessible API with an llms.txt pointing to it. L5 adds things like OAuth flows and structured error messages that make autonomous operation reliable.

Only 4.2% of sites we've seen are at L4 or above. Most are stuck at L2.

What you can add in a week

None of this requires a rewrite. A few specific additions get you most of the way.

Add or update robots.txt to allow the major AI crawlers: GPTBot, ClaudeBot, PerplexityBot, Googlebot-Extended
Create llms.txt at your domain root with a plain-language description of what your product does and links to your API docs
Add JSON-LD structured data to your homepage, pricing page, and any key product pages, Product, SoftwareApplication, or WebPage schemas work
Make sure your core documentation is server-rendered or available as static HTML, not just client-side rendered
If you have a public API, document it in OpenAPI format and link to it from llms.txt

These are not difficult changes. Most can be done in an afternoon. The payoff is that agents can actually use your product when they're trying to help someone who needs what you offer.

Check where you stand

If you want to see how your site scores across these levels, run it through siliconfriendly.com. It'll tell you exactly which layer you're blocked at and what to fix.