The Complete llms.txt Guide for 2026

robots.txt told search engine crawlers what they could access. llms.txt does the same job for AI systems — but it goes further, giving language models structured context about who you are, what your site covers, and how you'd like to be cited. If you're thinking about AI discoverability, this is the single highest-signal file you can add to your site.

What is llms.txt?

llms.txt is a plain-text file placed at the root of your domain (yourdomain.com/llms.txt) that provides AI crawlers and language models with structured information about your site. It was proposed by Answer.AI co-founder Jeremy Howard in late 2024 and has since gained traction among AI search engines including Perplexity, You.com, and a growing list of LLM-powered tools.

ℹStandard status

llms.txt is a community-driven proposal, not an official W3C standard. Adoption is growing but not universal. Implementing it has no downside and meaningful upside as more AI systems formalise their crawling policies.

First: check your robots.txt isn't blocking AI crawlers

There's no point adding llms.txt if your robots.txt is blocking the crawlers that would read it. GPTBot and ClaudeBot follow robots.txt directives. Check yours first.

🔓

robots.txt CheckerFree · No login

Validate every robots.txt directive and audit AI bot access — GPTBot, PerplexityBot, ClaudeBot, Google-Extended — with a one-click fix snippet. Free, no login.

→

The core llms.txt structure

markdown

# llms.txt for ExampleCorp
# Place at: https://yourdomain.com/llms.txt

## Identity
name: ExampleCorp
url: https://yourdomain.com
description: B2B SaaS platform for technical SEO infrastructure.
language: en
last_updated: 2026-05-28

## Permissions
allow_training: no
allow_inference: yes
allow_citation: yes
preferred_citation_format: "ExampleCorp (yourdomain.com)"

## Content
primary_topics:
  - technical SEO
  - AI search discoverability
  - indexing infrastructure
content_types:
  - blog posts
  - technical guides
update_frequency: weekly

## Key URLs
homepage: https://yourdomain.com
blog: https://yourdomain.com/blog
sitemap: https://yourdomain.com/sitemap.xml

The allow_training decision

The most consequential field is allow_training. Setting it to "no" signals you don't consent to your content being used to train new AI models — but you still allow inference-time retrieval (being cited in responses). Most sites should set allow_inference: yes and allow_citation: yes while carefully considering their training preference.

The llms-full.txt variant

There's a growing convention of also providing llms-full.txt — a more detailed version that includes your most important content, pre-structured for AI consumption. Think of it as your AI press kit.

✦Think of llms-full.txt as your AI press kit

If an AI system is trying to decide whether to cite you on a topic, llms-full.txt gives it everything it needs without crawling dozens of pages. Include your strongest credibility signals and most authoritative content excerpts.

Verify your AI visibility after implementation

Once llms.txt is live, check your AI Visibility Score to confirm the file is being read and that your overall discoverability has improved.

🔧

AI Visibility CheckerFree account

Verify your llms.txt is working — and check your citation probability across ChatGPT, Perplexity, Gemini, and Google AI with per-signal scoring.

→

Common mistakes to avoid

→Inconsistency between llms.txt and robots.txt — AI systems follow the more restrictive rule
→Vague descriptions — specific topic definitions are far more useful than generic ones
→Stale last_updated dates — an outdated timestamp signals unmaintained content
→Missing contact information — AI systems and operators need a way to reach you
→Forgetting to submit the llms.txt URL to AI platform developer portals as they become available

Anita R.

CEO · SEOVentra

Co-founder and CEO of SEOVentra. Product, growth, and go-to-market. Writes about SEO strategy, AI search, and what it actually takes to rank and get cited by AI systems.