robots.txt told search engine crawlers what they could access. llms.txt does the same job for AI systems — but it goes further, giving language models structured context about who you are, what your site covers, and how you'd like to be cited. If you're thinking about AI discoverability, this is the single highest-signal file you can add to your site.
What is llms.txt?
llms.txt is a plain-text file placed at the root of your domain (yourdomain.com/llms.txt) that provides AI crawlers and language models with structured information about your site. It was proposed by Answer.AI co-founder Jeremy Howard in late 2024 and has since gained traction among AI search engines including Perplexity, You.com, and a growing list of LLM-powered tools.
Unlike robots.txt, which is primarily a permission file, llms.txt is a context file. It helps AI systems understand your content's purpose, structure, authority, and preferred citation format before they read a single page. Think of it as a cover letter you write directly to the AI system that might feature your content.
llms.txt is a community-driven proposal, not an official W3C or IETF standard. Adoption is growing but not universal. Implementing it has no downside and meaningful upside as more AI systems formalise their crawling policies.
The core llms.txt structure
# llms.txt for ExampleCorp
# Place at: https://yourdomain.com/llms.txt
## Identity
name: ExampleCorp
url: https://yourdomain.com
description: B2B SaaS platform for technical SEO infrastructure.
language: en
last_updated: 2026-05-28
## Permissions
allow_training: no
allow_inference: yes
allow_citation: yes
preferred_citation_format: "ExampleCorp (yourdomain.com)"
## Content
primary_topics:
- technical SEO
- AI search discoverability
- indexing infrastructure
content_types:
- blog posts
- technical guides
- original research
update_frequency: weekly
## Key URLs
homepage: https://yourdomain.com
blog: https://yourdomain.com/blog
about: https://yourdomain.com/about
sitemap: https://yourdomain.com/sitemap.xmlThe allow_training decision
The most consequential field is allow_training. Setting it to "no" signals you don't consent to your content being used to train new AI models — but you still allow inference-time retrieval (being cited in responses). Most sites should set allow_inference: yes and allow_citation: yes while carefully considering their training preference.
The llms-full.txt variant
There's a growing convention of also providing llms-full.txt — a more detailed version that includes your most important content, pre-structured for AI consumption. This can include your product description, key claims, team credentials, methodology, and content abstracts.
If an AI system is trying to decide whether to cite you on a topic, llms-full.txt gives it everything it needs to make that decision without crawling dozens of pages. Include your strongest credibility signals and most authoritative content excerpts.
Common mistakes to avoid
- →Inconsistency between llms.txt and robots.txt — if robots.txt blocks GPTBot but llms.txt says allow_inference: yes, AI systems follow the more restrictive rule
- →Vague descriptions — "we write about marketing" is less useful than specific topic definitions
- →Stale last_updated dates — an outdated timestamp signals unmaintained content and lower freshness
- →Missing contact information — AI systems and operators need a way to reach you
- →Forgetting to submit the llms.txt URL to AI platform developer portals as they become available
Who should implement llms.txt now?
Any site publishing original content and wanting to be cited in AI-generated answers should implement llms.txt now. The effort is low, the upside is meaningful as AI search grows, and early implementation puts you ahead of the majority of sites that haven't done it yet.
visibility score
See how discoverable your content is to AI search engines — free, no card required.
Start free →