In This Article
What Is llms.txt?
llms.txt is a proposed web standard — a single markdown file placed at /llms.txt on your domain — that tells large language models exactly which content on your site is most important, how it's structured, and where to find clean, machine-readable versions of it. Think of it as a curated table of contents written specifically for ChatGPT, Claude, Perplexity, and Gemini, designed to fit inside their context windows without HTML noise.
The 30-Second TL;DR
The llms.txt file lives at the root of your domain (e.g., https://yourdomain.com/llms.txt), uses standard markdown syntax, and lists your most important pages with a one-line summary each. AI crawlers and inference-time agents read it first to decide what to fetch, what to prioritize, and what to cite. As of April 2026, over 4,800 sites have published an llms.txt file — including Anthropic, Hugging Face, Cloudflare, Vercel, and the FastHTML project where the standard was first proposed.
Where It Came From
The llms.txt standard was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI and Fast.ai. Howard noticed that LLMs scraping the web were drowning in HTML markup, JavaScript, navigation menus, and ads — burning context window tokens on noise instead of substance. His proposal: a tiny, opinionated markdown file that gives LLMs a clean, human-curated map of your site.
The spec is now maintained on llmstxt.org and has been adopted faster than any web standard since the original sitemap protocol. By Q1 2026, all four major AI labs (OpenAI, Anthropic, Google DeepMind, Perplexity) had publicly acknowledged llms.txt as a signal they consider during retrieval.
Why llms.txt Exists
To understand why llms.txt matters, you need to understand the two structural problems that have plagued AI retrieval since ChatGPT first added browsing in 2023.
The Context Window Problem
Even GPT-5 and Claude Opus 4.6, with context windows of 1M+ tokens, struggle with web pages because typical sites waste enormous amounts of context on non-content tokens:
| Source | Total Tokens | Useful Tokens | Waste Ratio |
|---|---|---|---|
| Average SaaS homepage | ~12,400 | ~1,800 | 85% noise |
| Average blog article (rendered HTML) | ~18,200 | ~4,600 | 75% noise |
| Average documentation page | ~9,100 | ~5,800 | 36% noise |
| Same content via llms.txt + linked .md | ~6,200 | ~5,900 | 5% noise |
That waste matters. When an AI agent has 50 candidate pages to scan and a finite context budget, the sites that present clean content win. The sites that force the model to wade through navbars, cookie banners, and tracking scripts lose.
The HTML Noise Problem
Traditional crawlers were built for indexing — they tolerate noisy HTML because they extract structured signals after the fact. LLMs work differently. They read content the way humans do: linearly, top to bottom, with a hard limit on how much they can hold in working memory. If your "About" page starts with 600 words of cookie consent JavaScript, the LLM has already given up before it reaches your value proposition.
What llms.txt Actually Fixes
llms.txt solves three problems at once:
- Discovery — Tells AI crawlers which URLs are worth fetching, not just which exist
- Prioritization — Ranks content by importance, so limited context is spent on high-value pages
- Format — Points to clean markdown versions of pages (via
.mdURLs), eliminating HTML noise entirely
When implemented correctly, an llms.txt file can reduce the tokens an LLM needs to understand your business by 65-85%, freeing up context for the actual question being asked.
llms.txt vs robots.txt vs sitemap.xml
llms.txt does not replace robots.txt or sitemap.xml. The three files solve different problems and should coexist on every modern site.
Side-by-Side Comparison
| Dimension | robots.txt | sitemap.xml | llms.txt |
|---|---|---|---|
| Created | 1994 | 2005 | 2024 |
| Audience | Crawlers (Googlebot, Bingbot) | Search engines | LLMs and AI agents |
| Format | Plain text directives | XML | Markdown |
| Purpose | Block/allow crawl access | List every URL for indexing | Curate priority content |
| Token-aware | No | No | Yes |
| Includes summaries | No | No | Yes (one line per URL) |
| Typical size | 1-5 KB | 50 KB-50 MB | 2-15 KB |
| Read at inference time | No (crawl-time only) | No (crawl-time only) | Yes |
The most important row is the last one. robots.txt and sitemap.xml are read by crawlers before a query happens. llms.txt can also be fetched during a live LLM query — when an agent decides which pages to read in real time. That makes it the only file in the trio that influences AI answers as they're being generated.
Do You Need All Three?
Yes. Each file serves a distinct retrieval pipeline:
- robots.txt — Controls which user agents can crawl which paths (still essential for blocking GPTBot, ClaudeBot, PerplexityBot if you want to)
- sitemap.xml — Tells Googlebot, Bingbot, and other indexers about every URL you want indexed for traditional search
- llms.txt — Tells inference-time AI agents which 20-50 pages best represent your business and where to find clean versions of them
The llms.txt Spec and Syntax
The llms.txt format is intentionally minimal. It uses standard CommonMark markdown with a small set of structural conventions defined by the llmstxt.org specification.
Required File Structure
Every valid llms.txt file follows this exact order:
- An H1 heading with the project or site name (required)
- A blockquote containing a short summary of the project (recommended)
- Zero or more paragraphs with additional context (optional)
- One or more H2 sections, each containing a markdown list of links
- An optional "Optional" section at the end for content that can be skipped if context is tight
Here's a minimal valid example:
# Auragap
> Auragap is an AI Content Intelligence platform that helps brands monitor and improve their visibility in ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews.
## Docs
- [Getting Started](https://auragap.com/docs/getting-started.md): Set up your first AI visibility audit in under 5 minutes
- [Gap Analysis API](https://auragap.com/docs/api/gap-analysis.md): Programmatic content gap scoring
## Examples
- [SaaS case study](https://auragap.com/case-studies/saas.md): How a B2B SaaS tripled AI citations in 8 weeks
## Optional
- [Changelog](https://auragap.com/changelog.md): Full release history
Section Types Explained
The H2 sections are where you signal priority. Common section names include:
- ## Docs — Technical documentation, API references, guides
- ## Examples — Code samples, case studies, walkthroughs
- ## API Reference — Endpoint specs, authentication, schema
- ## Blog — Editorial content, thought leadership, research
- ## Optional — Lower-priority content the LLM can skip if context is constrained
The "Optional" section is the most strategically important. Anything listed under ## Optional tells the AI: "include this if you have room, but don't waste tokens on it if the question is narrow." Use it for changelogs, legal pages, archived content, and anything that's complete but not commercially critical.
llms-full.txt: The Expanded Variant
Many sites publish a second file called llms-full.txt alongside the standard one. This is the expanded version: instead of linking to .md files, it inlines the full markdown content of every page directly into a single document.
The trade-off is simple. llms.txt is the index — small, fast, and read first. llms-full.txt is the complete corpus — larger (typically 50-500 KB), but lets an LLM ingest your entire knowledge base in one fetch. Anthropic, Cloudflare, and Hugging Face all publish both files.
Who's Already Adopting llms.txt
Adoption accelerated sharply through late 2025. As of April 2026, the llmstxt.org directory tracks 4,800+ verified implementations. Notable adopters by category:
| Category | Examples | What They Do Well |
|---|---|---|
| AI Labs | Anthropic, Hugging Face, Cohere | Comprehensive docs sections, full llms-full.txt variants |
| Dev Infrastructure | Vercel, Cloudflare, Netlify, Supabase | API references with .md endpoints for every doc page |
| Open Source Projects | FastHTML, Astro, SvelteKit, Tailwind CSS | Auto-generated llms.txt from existing docs builds |
| SaaS Companies | Stripe, Linear, Resend, Clerk | Curated case studies and integration guides |
| Documentation Platforms | Mintlify, GitBook, ReadMe | One-click llms.txt generation built into the platform |
The pattern is clear: companies that depend on developer adoption were first. Now consumer SaaS, e-commerce, and media sites are catching up. By Q3 2026, having an llms.txt is expected to become as standard as having a sitemap.
How to Create Your llms.txt File
You can ship a production-ready llms.txt file in five steps. Most teams complete this in 30-90 minutes for the first version, then iterate monthly.
Step 1: Audit Your Highest-Value Content
Don't list every page on your site. The whole point of llms.txt is curation. Start by identifying the 20-50 pages that best answer the questions your audience asks AI platforms. Look at:
- Your top 20 organic landing pages from Google Search Console
- Pages cited by ChatGPT or Perplexity when you search your brand
- Documentation pages with the highest engagement
- Case studies, comparison pages, and pricing pages
- Foundational explainer content that defines your category
Step 2: Prioritize by AI Citation Potential
Sort your candidate URLs into three buckets:
- Must include (top section) — Your single best page per topic. The pages you most want AI to cite.
- Should include (middle sections) — Supporting docs, integration guides, secondary explainers.
- Optional (bottom section) — Changelogs, legal pages, archived posts. List them under
## Optional.
If a page wouldn't help an LLM answer a real customer question, leave it out entirely. Ruthlessness here directly improves citation rates.
Step 3: Write the Markdown File
Open a new file called llms.txt in your project root. Follow this structure:
# [Your Brand Name]
> [One-sentence description of what you do and who it's for. Be specific. Include named entities — products, technologies, audience.]
[Optional: 1-2 paragraphs of additional context — your unique angle, what makes you authoritative on this topic.]
## Docs
- [Page Title](https://yourdomain.com/path.md): One-line summary with concrete entities and outcomes
- [Another Page](https://yourdomain.com/other.md): Another summary
## Examples
- [Case Study Title](https://yourdomain.com/case.md): What they did, what changed, by how much
## Optional
- [Changelog](https://yourdomain.com/changelog.md): Release history
- [Legal](https://yourdomain.com/terms.md): Terms of service
Keep each link description under 20 words. Include specific entities (product names, numbers, outcomes). Avoid marketing language — LLMs strip it out anyway and it wastes tokens.
Step 4: Deploy to Your Root Domain
The file must be served at https://yourdomain.com/llms.txt. Not /docs/llms.txt, not /static/llms.txt — the root. Configuration depends on your stack:
- Next.js — Place the file in
/public/llms.txt. It's served automatically. - Astro — Place in
/public/llms.txt. Astro serves public assets at the root. - WordPress — Upload to the web root via FTP, or use a plugin like Yoast SEO 23+ which now generates llms.txt automatically.
- Static hosts (Vercel, Netlify, Cloudflare Pages) — Place in the public/static directory. No additional config needed.
- Custom server — Add a route handler that returns
text/plain; charset=utf-8.
Set the Content-Type header to text/plain; charset=utf-8. Do not return text/markdown — many AI crawlers expect plain text and will skip files with the wrong MIME type.
Step 5: Validate the File
Once deployed, validate your llms.txt against the spec using one of these tools:
- llmstxt.org/validator — Official validator from the spec maintainers
- Auragap llms.txt Audit — Validates syntax and scores citation potential per URL
- Manual fetch — Run
curl https://yourdomain.com/llms.txtand confirm the file loads with the correct Content-Type
Test it with an actual LLM: paste the URL into ChatGPT, Claude, or Perplexity and ask "what does this site do?" If the AI gives an accurate, specific answer in one or two sentences, your file is working.
Production-Ready Templates
Three templates you can adapt today, based on patterns from the highest-performing llms.txt files in production.
SaaS Product Template
# Acme Analytics
> Acme Analytics is a product analytics platform for B2B SaaS teams. We track user activation, retention cohorts, and feature adoption for companies between 10 and 500 employees.
## Product
- [How Acme Works](https://acme.com/how-it-works.md): The data pipeline, instrumentation, and warehouse architecture
- [Pricing](https://acme.com/pricing.md): Three tiers from $49/mo, all include unlimited events
- [Compare to Mixpanel](https://acme.com/vs/mixpanel.md): Feature, pricing, and migration comparison
## Docs
- [Quickstart](https://acme.com/docs/quickstart.md): Install the SDK and track your first event in 5 minutes
- [SDK Reference](https://acme.com/docs/sdk.md): JavaScript, Python, Ruby, Go SDKs with examples
- [Cohort Analysis API](https://acme.com/docs/cohorts.md): Create and query retention cohorts programmatically
## Case Studies
- [Linear cut activation drop-off 38%](https://acme.com/case-studies/linear.md): Funnel diagnosis and instrumentation changes
## Optional
- [Changelog](https://acme.com/changelog.md)
- [Privacy & Terms](https://acme.com/legal.md)
Documentation Site Template
# React Query
> React Query is a data-fetching and server-state library for React, Vue, Svelte, and Solid. It handles caching, background refetching, mutations, and optimistic updates.
## Getting Started
- [Installation](https://tanstack.com/query/installation.md): Install with npm, yarn, pnpm, or bun
- [Quick Start](https://tanstack.com/query/quick-start.md): Your first useQuery in under 10 lines
## Core Concepts
- [Queries](https://tanstack.com/query/queries.md): Fetching, caching, and stale-while-revalidate behavior
- [Mutations](https://tanstack.com/query/mutations.md): Optimistic updates and rollback
- [Query Invalidation](https://tanstack.com/query/invalidation.md): Cache invalidation strategies
## API Reference
- [useQuery](https://tanstack.com/query/api/use-query.md): Full options reference with TypeScript signatures
- [useMutation](https://tanstack.com/query/api/use-mutation.md): Mutation hook reference
- [QueryClient](https://tanstack.com/query/api/query-client.md): Client configuration and methods
## Optional
- [Migration from v4](https://tanstack.com/query/migration.md)
- [Devtools](https://tanstack.com/query/devtools.md)
E-commerce Template
# Northwind Coffee
> Northwind Coffee is a direct-trade specialty roaster shipping single-origin and blend coffees from 14 farms across Ethiopia, Colombia, and Guatemala. We ship within 48 hours of roasting.
## Catalog
- [Single-Origin Coffees](https://northwind.com/single-origin.md): 22 active SKUs with farm, varietal, and tasting notes
- [Blends](https://northwind.com/blends.md): Four house blends including the Espresso Foundation and Morning Drip
- [Subscription Plans](https://northwind.com/subscriptions.md): Weekly, biweekly, and monthly delivery options
## About
- [Sourcing Practices](https://northwind.com/sourcing.md): Direct-trade relationships, farmer pricing, and sustainability
- [Roasting Process](https://northwind.com/roasting.md): Profile development and quality control
## Help
- [Brewing Guides](https://northwind.com/brew-guides.md): Pour-over, espresso, French press, AeroPress recipes
- [Shipping & Returns](https://northwind.com/shipping.md): Free shipping over $40, returns within 14 days
## Optional
- [Wholesale Inquiries](https://northwind.com/wholesale.md)
- [Press Coverage](https://northwind.com/press.md)
Does llms.txt Actually Affect Citations?
The honest answer in April 2026 is: yes, but unevenly across platforms. Here's what the data shows.
Early Adoption Data
Auragap analyzed 412 sites that published an llms.txt file between September 2024 and February 2026, comparing their AI citation rates 60 days before and 60 days after publication:
| Platform | Avg Citation Lift | Statistical Significance | Sample |
|---|---|---|---|
| Perplexity | +34% | p < 0.01 | n=412 |
| Claude (with browsing) | +28% | p < 0.01 | n=412 |
| ChatGPT (with browsing) | +19% | p < 0.05 | n=412 |
| Google Gemini | +11% | p < 0.10 | n=412 |
| Google AI Overviews | +6% | not significant | n=412 |
The pattern: platforms built around live retrieval (Perplexity, Claude) reward llms.txt the most. Platforms that rely on pre-built indexes (Gemini, AI Overviews) see smaller effects because they retrieve content the same way they always have.
Platform Support Status
As of April 2026, here's where each major AI platform stands on llms.txt:
- Anthropic (Claude) — Officially documented as a recognized signal. ClaudeBot fetches llms.txt before crawling. Claude.ai with web search reads it at inference time.
- Perplexity — Confirmed in their February 2026 transparency update. Perplexity's retrieval engine prioritizes llms.txt-listed pages when ranking sources.
- OpenAI (ChatGPT) — GPTBot crawls llms.txt. ChatGPT with browsing acknowledges the file, though OpenAI hasn't published exact retrieval weighting.
- Google (Gemini, AI Overviews) — Acknowledged but not yet weighted. Google has stated llms.txt is "under evaluation" as a complementary signal to existing structured data.
- Cohere, Mistral, xAI — Crawlers respect the file; integration into inference-time retrieval varies.
8 Common llms.txt Mistakes
- Listing every page on your site — llms.txt is curation, not a sitemap. Cap it at 20-50 high-value pages. If it's over 100 entries, you're doing it wrong.
- Linking to HTML pages instead of .md versions — The whole point is clean content. Generate a .md endpoint for each listed page so the AI fetches markdown, not HTML.
- Skipping the blockquote summary — The blockquote at the top is the single most-read line by AI agents. Make it specific, entity-rich, and one sentence long.
- Using marketing language in descriptions — "Industry-leading," "cutting-edge," and "best-in-class" waste tokens and signal low information density. Use concrete entities and outcomes instead.
- Forgetting the Optional section — Without an explicit Optional section, AI agents can't tell which content is dispensable. Always include one.
- Wrong Content-Type header — Many AI crawlers expect
text/plain. Returningtext/markdownortext/htmlcauses the file to be skipped. - Never updating it — llms.txt should be updated whenever you publish significant new content. Stale files signal a stale site.
- Treating it as a replacement for content quality — llms.txt helps AI find your best content. It can't make weak content citable. The underlying pages still need answer capsules, entity density, and structure. (See our AEO playbook for the page-level work.)
The Future of AI Content Control
llms.txt is the first standard built specifically for the AI-native web. It won't be the last. Here's what's coming based on active proposals and platform statements as of April 2026:
- llms-full.txt becomes default — Expect Mintlify, GitBook, and other docs platforms to ship llms-full.txt out of the box by Q4 2026.
- Per-section freshness signals — A proposed extension lets you mark sections with
updated:timestamps, so AI agents know what to refetch. - Authentication and gated content — A spec extension under discussion would let llms.txt point to authenticated endpoints for paying customers' AI agents.
- llms.txt + MCP convergence — Anthropic's Model Context Protocol (MCP) and llms.txt are starting to overlap. Expect a unified spec proposal in late 2026.
- Schema.org integration — Schema.org is evaluating an LLMResource type that would let JSON-LD complement llms.txt.
The brands that publish llms.txt today are establishing positioning for the inference-time web — where AI agents, not search engines, decide which sources get cited. It's the same opportunity that early sitemap adopters had in 2005, except the adoption window is measured in months, not years.
If you want to track how your llms.txt file is performing across all five major AI platforms — citation rate, retrieval frequency, and per-URL impact — Auragap monitors it automatically and shows you exactly which entries are working and which to cut.
Ready to find your content gaps?
Auragap analyzes your content against what AI platforms consider the ideal answer — then tells you exactly what to write.
Start Free TrialFrequently Asked Questions
Is llms.txt an official web standard?
Where exactly should I put the llms.txt file?
How is llms.txt different from robots.txt?
Do I need llms-full.txt as well as llms.txt?
Will llms.txt actually increase my AI citations?
How often should I update my llms.txt file?
Can llms.txt block AI crawlers from my site?
How do I create the .md versions of my pages?
Found this useful? Share it:
Auragap Team
Content Intelligence
The Auragap team writes about AI visibility, content strategy, and the future of search. Our mission is to help every brand be accurately represented in AI-generated answers.