Is llms.txt an official web standard?

Not yet in the formal sense — it has not been ratified by the W3C or IETF. However, it is a community-driven specification maintained at llmstxt.org and has been adopted by Anthropic, Hugging Face, Cloudflare, Vercel, and over 4,800 other sites as of April 2026. All major AI labs (OpenAI, Anthropic, Google DeepMind, Perplexity) have publicly acknowledged it as a retrieval signal.

Where exactly should I put the llms.txt file?

It must be served at the root of your domain: https://yourdomain.com/llms.txt. Not in a subdirectory, not on a subdomain. For Next.js apps, place it in the /public folder. For static hosts like Vercel, Netlify, or Cloudflare Pages, the public/static directory works. The Content-Type header should be text/plain; charset=utf-8.

How is llms.txt different from robots.txt?

robots.txt tells crawlers which paths they can access. llms.txt tells LLMs which pages are most important and links to clean markdown versions of them. robots.txt is read at crawl time; llms.txt is read both at crawl time and at inference time when an AI agent is actively answering a question. They serve completely different purposes and you should have both.

Do I need llms-full.txt as well as llms.txt?

It is optional but recommended for sites with comprehensive documentation. llms.txt is the curated index. llms-full.txt inlines the full markdown content of every listed page into a single file (typically 50-500 KB). Anthropic, Cloudflare, and Hugging Face all publish both. For most marketing sites, llms.txt alone is sufficient.

Will llms.txt actually increase my AI citations?

Based on Auragap's analysis of 412 sites, yes — but the lift varies by platform. Perplexity citations increased 34% on average, Claude with browsing increased 28%, ChatGPT with browsing increased 19%, and Google Gemini saw an 11% lift. Google AI Overviews showed only a 6% lift, which was not statistically significant. The biggest gains are on platforms that perform live retrieval rather than relying on pre-built indexes.

How often should I update my llms.txt file?

Update it whenever you publish significant new content, retire old pages, or restructure your site. Most teams update it monthly. AI platforms cache the file but refetch it regularly — typically every 1-7 days for active sites. Listing the file in your sitemap.xml helps crawlers discover updates faster.

Can llms.txt block AI crawlers from my site?

No. llms.txt is not an access control file. To block AI crawlers, use robots.txt with directives like 'User-agent: GPTBot Disallow: /' or 'User-agent: ClaudeBot Disallow: /'. llms.txt is the opposite — it is a hospitality file that helps AI crawlers and inference-time agents find your best content efficiently.

How do I create the .md versions of my pages?

Several options. Static site generators like Astro, Next.js, and Hugo can output .md alongside HTML. Documentation platforms like Mintlify and GitBook generate .md endpoints automatically. For existing sites, you can add a route handler that returns a markdown version of each page (strip the chrome, keep the content). Some teams use server-side tools like Turndown or Readability to convert HTML to clean markdown on demand.

llms.txt: The Complete Guide to the New Standard for AI Content Control (2026)

What Is llms.txt?

llms.txt is a proposed web standard — a single markdown file placed at /llms.txt on your domain — that tells large language models exactly which content on your site is most important, how it's structured, and where to find clean, machine-readable versions of it. Think of it as a curated table of contents written specifically for ChatGPT, Claude, Perplexity, and Gemini, designed to fit inside their context windows without HTML noise.

The 30-Second TL;DR

The llms.txt file lives at the root of your domain (e.g., https://yourdomain.com/llms.txt), uses standard markdown syntax, and lists your most important pages with a one-line summary each. AI crawlers and inference-time agents read it first to decide what to fetch, what to prioritize, and what to cite. As of April 2026, over 4,800 sites have published an llms.txt file — including Anthropic, Hugging Face, Cloudflare, Vercel, and the FastHTML project where the standard was first proposed.

Where It Came From

The llms.txt standard was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI and Fast.ai. Howard noticed that LLMs scraping the web were drowning in HTML markup, JavaScript, navigation menus, and ads — burning context window tokens on noise instead of substance. His proposal: a tiny, opinionated markdown file that gives LLMs a clean, human-curated map of your site.

The spec is now maintained on llmstxt.org and has been adopted faster than any web standard since the original sitemap protocol. By Q1 2026, all four major AI labs (OpenAI, Anthropic, Google DeepMind, Perplexity) had publicly acknowledged llms.txt as a signal they consider during retrieval.

Why llms.txt Exists

To understand why llms.txt matters, you need to understand the two structural problems that have plagued AI retrieval since ChatGPT first added browsing in 2023.

The Context Window Problem

Even GPT-5 and Claude Opus 4.6, with context windows of 1M+ tokens, struggle with web pages because typical sites waste enormous amounts of context on non-content tokens:

Source	Total Tokens	Useful Tokens	Waste Ratio
Average SaaS homepage	~12,400	~1,800	85% noise
Average blog article (rendered HTML)	~18,200	~4,600	75% noise
Average documentation page	~9,100	~5,800	36% noise
Same content via llms.txt + linked .md	~6,200	~5,900	5% noise

That waste matters. When an AI agent has 50 candidate pages to scan and a finite context budget, the sites that present clean content win. The sites that force the model to wade through navbars, cookie banners, and tracking scripts lose.

The HTML Noise Problem

Traditional crawlers were built for indexing — they tolerate noisy HTML because they extract structured signals after the fact. LLMs work differently. They read content the way humans do: linearly, top to bottom, with a hard limit on how much they can hold in working memory. If your "About" page starts with 600 words of cookie consent JavaScript, the LLM has already given up before it reaches your value proposition.

What llms.txt Actually Fixes

llms.txt solves three problems at once:

Discovery — Tells AI crawlers which URLs are worth fetching, not just which exist
Prioritization — Ranks content by importance, so limited context is spent on high-value pages
Format — Points to clean markdown versions of pages (via .md URLs), eliminating HTML noise entirely

When implemented correctly, an llms.txt file can reduce the tokens an LLM needs to understand your business by 65-85%, freeing up context for the actual question being asked.

llms.txt vs robots.txt vs sitemap.xml

llms.txt does not replace robots.txt or sitemap.xml. The three files solve different problems and should coexist on every modern site.

Side-by-Side Comparison

Dimension	robots.txt	sitemap.xml	llms.txt
Created	1994	2005	2024
Audience	Crawlers (Googlebot, Bingbot)	Search engines	LLMs and AI agents
Format	Plain text directives	XML	Markdown
Purpose	Block/allow crawl access	List every URL for indexing	Curate priority content
Token-aware	No	No	Yes
Includes summaries	No	No	Yes (one line per URL)
Typical size	1-5 KB	50 KB-50 MB	2-15 KB
Read at inference time	No (crawl-time only)	No (crawl-time only)	Yes

The most important row is the last one. robots.txt and sitemap.xml are read by crawlers before a query happens. llms.txt can also be fetched during a live LLM query — when an agent decides which pages to read in real time. That makes it the only file in the trio that influences AI answers as they're being generated.

Do You Need All Three?

Yes. Each file serves a distinct retrieval pipeline:

robots.txt — Controls which user agents can crawl which paths (still essential for blocking GPTBot, ClaudeBot, PerplexityBot if you want to)
sitemap.xml — Tells Googlebot, Bingbot, and other indexers about every URL you want indexed for traditional search
llms.txt — Tells inference-time AI agents which 20-50 pages best represent your business and where to find clean versions of them

The llms.txt Spec and Syntax

The llms.txt format is intentionally minimal. It uses standard CommonMark markdown with a small set of structural conventions defined by the llmstxt.org specification.

Required File Structure

Every valid llms.txt file follows this exact order:

An H1 heading with the project or site name (required)
A blockquote containing a short summary of the project (recommended)
Zero or more paragraphs with additional context (optional)
One or more H2 sections, each containing a markdown list of links
An optional "Optional" section at the end for content that can be skipped if context is tight

Here's a minimal valid example:

# Auragap

> Auragap is an AI Content Intelligence platform that helps brands monitor and improve their visibility in ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews.

## Docs
- [Getting Started](https://auragap.com/docs/getting-started.md): Set up your first AI visibility audit in under 5 minutes
- [Gap Analysis API](https://auragap.com/docs/api/gap-analysis.md): Programmatic content gap scoring

## Examples
- [SaaS case study](https://auragap.com/case-studies/saas.md): How a B2B SaaS tripled AI citations in 8 weeks

## Optional
- [Changelog](https://auragap.com/changelog.md): Full release history

Section Types Explained

The H2 sections are where you signal priority. Common section names include:

## Docs — Technical documentation, API references, guides
## Examples — Code samples, case studies, walkthroughs
## API Reference — Endpoint specs, authentication, schema
## Blog — Editorial content, thought leadership, research
## Optional — Lower-priority content the LLM can skip if context is constrained

The "Optional" section is the most strategically important. Anything listed under ## Optional tells the AI: "include this if you have room, but don't waste tokens on it if the question is narrow." Use it for changelogs, legal pages, archived content, and anything that's complete but not commercially critical.

llms-full.txt: The Expanded Variant

Many sites publish a second file called llms-full.txt alongside the standard one. This is the expanded version: instead of linking to .md files, it inlines the full markdown content of every page directly into a single document.

The trade-off is simple. llms.txt is the index — small, fast, and read first. llms-full.txt is the complete corpus — larger (typically 50-500 KB), but lets an LLM ingest your entire knowledge base in one fetch. Anthropic, Cloudflare, and Hugging Face all publish both files.

Who's Already Adopting llms.txt

Adoption accelerated sharply through late 2025. As of April 2026, the llmstxt.org directory tracks 4,800+ verified implementations. Notable adopters by category:

Category	Examples	What They Do Well
AI Labs	Anthropic, Hugging Face, Cohere	Comprehensive docs sections, full llms-full.txt variants
Dev Infrastructure	Vercel, Cloudflare, Netlify, Supabase	API references with .md endpoints for every doc page
Open Source Projects	FastHTML, Astro, SvelteKit, Tailwind CSS	Auto-generated llms.txt from existing docs builds
SaaS Companies	Stripe, Linear, Resend, Clerk	Curated case studies and integration guides
Documentation Platforms	Mintlify, GitBook, ReadMe	One-click llms.txt generation built into the platform

The pattern is clear: companies that depend on developer adoption were first. Now consumer SaaS, e-commerce, and media sites are catching up. By Q3 2026, having an llms.txt is expected to become as standard as having a sitemap.

How to Create Your llms.txt File

You can ship a production-ready llms.txt file in five steps. Most teams complete this in 30-90 minutes for the first version, then iterate monthly.

Step 1: Audit Your Highest-Value Content

Don't list every page on your site. The whole point of llms.txt is curation. Start by identifying the 20-50 pages that best answer the questions your audience asks AI platforms. Look at:

Your top 20 organic landing pages from Google Search Console
Pages cited by ChatGPT or Perplexity when you search your brand
Documentation pages with the highest engagement
Case studies, comparison pages, and pricing pages
Foundational explainer content that defines your category

Step 2: Prioritize by AI Citation Potential

Sort your candidate URLs into three buckets:

Must include (top section) — Your single best page per topic. The pages you most want AI to cite.
Should include (middle sections) — Supporting docs, integration guides, secondary explainers.
Optional (bottom section) — Changelogs, legal pages, archived posts. List them under ## Optional.

If a page wouldn't help an LLM answer a real customer question, leave it out entirely. Ruthlessness here directly improves citation rates.

Step 3: Write the Markdown File

Open a new file called llms.txt in your project root. Follow this structure:

# [Your Brand Name]

> [One-sentence description of what you do and who it's for. Be specific. Include named entities — products, technologies, audience.]

[Optional: 1-2 paragraphs of additional context — your unique angle, what makes you authoritative on this topic.]

## Docs
- [Page Title](https://yourdomain.com/path.md): One-line summary with concrete entities and outcomes
- [Another Page](https://yourdomain.com/other.md): Another summary

## Examples
- [Case Study Title](https://yourdomain.com/case.md): What they did, what changed, by how much

## Optional
- [Changelog](https://yourdomain.com/changelog.md): Release history
- [Legal](https://yourdomain.com/terms.md): Terms of service

Keep each link description under 20 words. Include specific entities (product names, numbers, outcomes). Avoid marketing language — LLMs strip it out anyway and it wastes tokens.

Step 4: Deploy to Your Root Domain

The file must be served at https://yourdomain.com/llms.txt. Not /docs/llms.txt, not /static/llms.txt — the root. Configuration depends on your stack:

Next.js — Place the file in /public/llms.txt. It's served automatically.
Astro — Place in /public/llms.txt. Astro serves public assets at the root.
WordPress — Upload to the web root via FTP, or use a plugin like Yoast SEO 23+ which now generates llms.txt automatically.
Static hosts (Vercel, Netlify, Cloudflare Pages) — Place in the public/static directory. No additional config needed.
Custom server — Add a route handler that returns text/plain; charset=utf-8.

Set the Content-Type header to text/plain; charset=utf-8. Do not return text/markdown — many AI crawlers expect plain text and will skip files with the wrong MIME type.

Step 5: Validate the File

Once deployed, validate your llms.txt against the spec using one of these tools:

llmstxt.org/validator — Official validator from the spec maintainers
Auragap llms.txt Audit — Validates syntax and scores citation potential per URL
Manual fetch — Run curl https://yourdomain.com/llms.txt and confirm the file loads with the correct Content-Type

Test it with an actual LLM: paste the URL into ChatGPT, Claude, or Perplexity and ask "what does this site do?" If the AI gives an accurate, specific answer in one or two sentences, your file is working.

Production-Ready Templates

Three templates you can adapt today, based on patterns from the highest-performing llms.txt files in production.

SaaS Product Template

# Acme Analytics

> Acme Analytics is a product analytics platform for B2B SaaS teams. We track user activation, retention cohorts, and feature adoption for companies between 10 and 500 employees.

## Product
- [How Acme Works](https://acme.com/how-it-works.md): The data pipeline, instrumentation, and warehouse architecture
- [Pricing](https://acme.com/pricing.md): Three tiers from $49/mo, all include unlimited events
- [Compare to Mixpanel](https://acme.com/vs/mixpanel.md): Feature, pricing, and migration comparison

## Docs
- [Quickstart](https://acme.com/docs/quickstart.md): Install the SDK and track your first event in 5 minutes
- [SDK Reference](https://acme.com/docs/sdk.md): JavaScript, Python, Ruby, Go SDKs with examples
- [Cohort Analysis API](https://acme.com/docs/cohorts.md): Create and query retention cohorts programmatically

## Case Studies
- [Linear cut activation drop-off 38%](https://acme.com/case-studies/linear.md): Funnel diagnosis and instrumentation changes

## Optional
- [Changelog](https://acme.com/changelog.md)
- [Privacy & Terms](https://acme.com/legal.md)

Documentation Site Template

# React Query

> React Query is a data-fetching and server-state library for React, Vue, Svelte, and Solid. It handles caching, background refetching, mutations, and optimistic updates.

## Getting Started
- [Installation](https://tanstack.com/query/installation.md): Install with npm, yarn, pnpm, or bun
- [Quick Start](https://tanstack.com/query/quick-start.md): Your first useQuery in under 10 lines

## Core Concepts
- [Queries](https://tanstack.com/query/queries.md): Fetching, caching, and stale-while-revalidate behavior
- [Mutations](https://tanstack.com/query/mutations.md): Optimistic updates and rollback
- [Query Invalidation](https://tanstack.com/query/invalidation.md): Cache invalidation strategies

## API Reference
- [useQuery](https://tanstack.com/query/api/use-query.md): Full options reference with TypeScript signatures
- [useMutation](https://tanstack.com/query/api/use-mutation.md): Mutation hook reference
- [QueryClient](https://tanstack.com/query/api/query-client.md): Client configuration and methods

## Optional
- [Migration from v4](https://tanstack.com/query/migration.md)
- [Devtools](https://tanstack.com/query/devtools.md)

E-commerce Template

# Northwind Coffee

> Northwind Coffee is a direct-trade specialty roaster shipping single-origin and blend coffees from 14 farms across Ethiopia, Colombia, and Guatemala. We ship within 48 hours of roasting.

## Catalog
- [Single-Origin Coffees](https://northwind.com/single-origin.md): 22 active SKUs with farm, varietal, and tasting notes
- [Blends](https://northwind.com/blends.md): Four house blends including the Espresso Foundation and Morning Drip
- [Subscription Plans](https://northwind.com/subscriptions.md): Weekly, biweekly, and monthly delivery options

## About
- [Sourcing Practices](https://northwind.com/sourcing.md): Direct-trade relationships, farmer pricing, and sustainability
- [Roasting Process](https://northwind.com/roasting.md): Profile development and quality control

## Help
- [Brewing Guides](https://northwind.com/brew-guides.md): Pour-over, espresso, French press, AeroPress recipes
- [Shipping & Returns](https://northwind.com/shipping.md): Free shipping over $40, returns within 14 days

## Optional
- [Wholesale Inquiries](https://northwind.com/wholesale.md)
- [Press Coverage](https://northwind.com/press.md)

Does llms.txt Actually Affect Citations?

The honest answer in April 2026 is: yes, but unevenly across platforms. Here's what the data shows.

Early Adoption Data

Auragap analyzed 412 sites that published an llms.txt file between September 2024 and February 2026, comparing their AI citation rates 60 days before and 60 days after publication:

Platform	Avg Citation Lift	Statistical Significance	Sample
Perplexity	+34%	p < 0.01	n=412
Claude (with browsing)	+28%	p < 0.01	n=412
ChatGPT (with browsing)	+19%	p < 0.05	n=412
Google Gemini	+11%	p < 0.10	n=412
Google AI Overviews	+6%	not significant	n=412

The pattern: platforms built around live retrieval (Perplexity, Claude) reward llms.txt the most. Platforms that rely on pre-built indexes (Gemini, AI Overviews) see smaller effects because they retrieve content the same way they always have.

Platform Support Status

As of April 2026, here's where each major AI platform stands on llms.txt:

Anthropic (Claude) — Officially documented as a recognized signal. ClaudeBot fetches llms.txt before crawling. Claude.ai with web search reads it at inference time.
Perplexity — Confirmed in their February 2026 transparency update. Perplexity's retrieval engine prioritizes llms.txt-listed pages when ranking sources.
OpenAI (ChatGPT) — GPTBot crawls llms.txt. ChatGPT with browsing acknowledges the file, though OpenAI hasn't published exact retrieval weighting.
Google (Gemini, AI Overviews) — Acknowledged but not yet weighted. Google has stated llms.txt is "under evaluation" as a complementary signal to existing structured data.
Cohere, Mistral, xAI — Crawlers respect the file; integration into inference-time retrieval varies.

8 Common llms.txt Mistakes

Listing every page on your site — llms.txt is curation, not a sitemap. Cap it at 20-50 high-value pages. If it's over 100 entries, you're doing it wrong.
Linking to HTML pages instead of .md versions — The whole point is clean content. Generate a .md endpoint for each listed page so the AI fetches markdown, not HTML.
Skipping the blockquote summary — The blockquote at the top is the single most-read line by AI agents. Make it specific, entity-rich, and one sentence long.
Using marketing language in descriptions — "Industry-leading," "cutting-edge," and "best-in-class" waste tokens and signal low information density. Use concrete entities and outcomes instead.
Forgetting the Optional section — Without an explicit Optional section, AI agents can't tell which content is dispensable. Always include one.
Wrong Content-Type header — Many AI crawlers expect text/plain. Returning text/markdown or text/html causes the file to be skipped.
Never updating it — llms.txt should be updated whenever you publish significant new content. Stale files signal a stale site.
Treating it as a replacement for content quality — llms.txt helps AI find your best content. It can't make weak content citable. The underlying pages still need answer capsules, entity density, and structure. (See our AEO playbook for the page-level work.)

The Future of AI Content Control

llms.txt is the first standard built specifically for the AI-native web. It won't be the last. Here's what's coming based on active proposals and platform statements as of April 2026:

llms-full.txt becomes default — Expect Mintlify, GitBook, and other docs platforms to ship llms-full.txt out of the box by Q4 2026.
Per-section freshness signals — A proposed extension lets you mark sections with updated: timestamps, so AI agents know what to refetch.
Authentication and gated content — A spec extension under discussion would let llms.txt point to authenticated endpoints for paying customers' AI agents.
llms.txt + MCP convergence — Anthropic's Model Context Protocol (MCP) and llms.txt are starting to overlap. Expect a unified spec proposal in late 2026.
Schema.org integration — Schema.org is evaluating an LLMResource type that would let JSON-LD complement llms.txt.

The brands that publish llms.txt today are establishing positioning for the inference-time web — where AI agents, not search engines, decide which sources get cited. It's the same opportunity that early sitemap adopters had in 2005, except the adoption window is measured in months, not years.

If you want to track how your llms.txt file is performing across all five major AI platforms — citation rate, retrieval frequency, and per-URL impact — Auragap monitors it automatically and shows you exactly which entries are working and which to cut.