97% of LLMS.txt Files Receive Zero Requests – Surprising Ahrefs Data Revealed

by Rohan Mehta
0 comments

97% Of LLMS.txt Files Got No Requests, Ahrefs Data Shows – Search Engine Journal

Ahrefs data reveals that 97% of websites implementing the proposed llms.txt file received zero requests from AI crawlers. This finding highlights a significant disconnect between site owners attempting to guide Large Language Models (LLMs) and the actual crawling behavior of major AI agents currently operating across the web.

What are the Ahrefs findings on llms.txt adoption?

Data analyzed by Ahrefs indicates a stark gap between the adoption of the llms.txt standard and its actual utility. While a growing number of webmasters have added these files to their root directories to provide AI-friendly summaries of their content, the vast majority of these files remain untouched by the bots they are designed to attract. According to the findings, only 3% of the sampled llms.txt files recorded any activity.

This lack of engagement suggests that the primary AI crawlers—those operated by companies like OpenAI, Google, and Anthropic—are not yet prioritizing this specific file format when indexing or retrieving information. The data indicates that site owners are preparing for a standard that the industry’s biggest players have not yet integrated into their automated workflows.

Key statistics from the analysis include:

  • 97% of identified llms.txt files saw no requests.
  • 3% of files were accessed at least once.
  • The discrepancy suggests a lack of standardization in how LLM agents discover “AI-optimized” site maps.

How does an llms.txt file differ from robots.txt?

To understand why the lack of requests is significant, it is necessary to distinguish between the long-standing robots.txt standard and the newly proposed llms.txt format. While both reside in the root directory of a website, they serve opposite purposes.

The robots.txt file is a restrictive tool. It tells crawlers where they are not allowed to go. It is a set of boundaries designed to prevent server overload or protect private directories. Most major search engines and AI bots, including GPTBot, respect these directives to avoid legal or technical friction.

Conversely, llms.txt is a suggestive tool. It is designed to be a “welcome mat” for AI. Instead of blocking access, it provides a curated, Markdown-formatted summary of the website’s most important information. The goal is to help an LLM quickly understand the site’s purpose and locate the highest-quality data for training or Retrieval-Augmented Generation (RAG) without having to crawl thousands of potentially irrelevant HTML pages.

Feature robots.txt llms.txt
Primary Purpose Restriction and exclusion Guidance and optimization
Format Plain text (Key-value pairs) Markdown
Bot Interaction Tells bots what to ignore Tells bots what to prioritize
Industry Status Universally adopted standard Community-proposed experiment

Why are AI bots ignoring these guidance files?

The fact that 97% of these files are ignored points to several technical and strategic reasons why AI companies may not be utilizing llms.txt.

First, the llms.txt proposal is not an official IETF or W3C standard. It is a community-driven effort. Until the creators of the most powerful LLMs—such as OpenAI or Google—explicitly program their bots to look for this specific filename in the root directory, the files remain invisible to the software. Most bots are programmed to look for robots.txt and sitemaps.xml; they do not “guess” if a site has an AI-specific summary file.

Second, AI companies have already invested heavily in their own scraping and parsing technologies. Modern LLM crawlers use sophisticated HTML parsing and “headless browsers” that can navigate a site’s structure effectively without needing a Markdown summary. If a bot can already extract the necessary data from a well-structured page, the incentive to check for a secondary text file is low.

Third, there is the issue of trust and verification. A site owner could potentially use an llms.txt file to misrepresent their content or “prompt engineer” the bot into seeing the site in a specific light. AI developers generally prefer to crawl the actual content a human sees to ensure the training data is authentic and not a curated “advertisement” for the AI.

“The gap between implementation and usage shows that the web is currently in a transitional phase where site owners are guessing what AI wants, while AI developers are building their own proprietary ways to get it.”

What are the implications for SEOs and site owners?

For search engine optimization (SEO) professionals, the Ahrefs data serves as a cautionary tale about “speculative optimization.” The urge to optimize for AI is high, but the tools available are often ahead of the actual bot behavior.

However, this does not mean that creating an llms.txt file is useless. The file serves as a signal of intent. If a major AI provider decides to adopt the standard tomorrow, sites that already have the file in place will be immediately “AI-ready.” It is a low-effort, low-risk move that potentially offers a high reward if the standard gains traction.

The more immediate concern for site owners is the tension between visibility and protection. While some want to be found by LLMs to appear in AI Overviews or ChatGPT citations, others are using robots.txt to block AI bots entirely to prevent their intellectual property from being used for training without compensation.

Site owners currently face a strategic choice:

  • The Open Approach: Implement llms.txt and keep robots.txt open to maximize the chance of being cited in AI-generated answers.
  • The Protective Approach: Use robots.txt to block GPTBot and CCBot, prioritizing copyright over AI visibility.
  • The Hybrid Approach: Block training bots but allow “user-agent” bots that provide real-time citations (like Perplexity or Google’s AI Overviews).

For those pursuing the open approach, the Ahrefs data suggests that relying solely on llms.txt is insufficient. Traditional on-page SEO—clear headings, structured data (Schema.org), and fast load times—remains the only verified way to ensure content is digestible for both human users and AI agents.

How to properly implement an llms.txt file

Despite the current lack of requests, those who wish to be early adopters should follow the proposed guidelines to ensure their files are usable if and when bots begin seeking them. The standard suggests a two-tiered approach: a primary llms.txt file and an optional llms-full.txt file.

The primary llms.txt file

This file should be a concise Markdown document located at /llms.txt. It should include:

  • Site Name and Description: A brief explanation of what the site provides.
  • Core Links: A curated list of the most important pages, often with short descriptions of what each page contains.
  • Contextual Guidance: Instructions for the LLM on how to interpret the site’s data.

The extended llms-full.txt file

For larger sites, a second file—/llms-full.txt—can be used. This is essentially a comprehensive version of the primary file, containing more exhaustive lists of links and deeper summaries. The primary llms.txt file should link to this full version, allowing the bot to decide if it needs a high-level summary or a deep dive.

Example structure for a primary llms.txt:

# Site Name
> Brief description of the site's purpose.

Core Documentation

- [Page Title](/url): Description of the page's value. - [Page Title](/url): Description of the page's value.

Full Index

- [Full Site Map](/llms-full.txt)

Comparing AI crawling trends: The shift from indexing to synthesis

The struggle for llms.txt adoption reflects a broader shift in how the internet is consumed. For three decades, the goal of the web was indexing—creating a massive library where a search engine could point a user to a specific page. The “request” was a bridge to a destination.

Technical SEO for AI: Robots.txt, GPTBot & llms.txt Explained | 3.4. AEO Course by Ahrefs

AI agents operate on synthesis. They do not necessarily want to send a user to a page; they want to ingest the information on that page and provide the answer directly. This changes the value proposition of the root-directory file. While a sitemap helps a bot index pages for a search engine, an llms.txt file helps a bot synthesize a knowledge base for a generative response.

This shift creates a paradox. Site owners want the “traffic” that comes from being a source for an AI answer, but the very nature of the AI answer reduces the need for the user to ever click through to the website. This is likely why AI companies are not rushing to adopt a standard that makes it easier for site owners to control how their data is synthesized.

Related explainer on the impact of AI Overviews on organic click-through rates.

Common misconceptions about AI-ready files

There are several prevailing myths regarding the use of llms.txt and other AI-centric optimizations that site owners should avoid.

Myth 1: Adding an llms.txt file will immediately boost my visibility in AI Overviews.
As the Ahrefs data proves, most bots aren’t even looking for the file. Visibility in AI responses is driven by content quality, authority, and the bot’s ability to parse your existing HTML, not the presence of a specific text file.

Myth 2: If I block GPTBot in robots.txt, I don’t need an llms.txt file.
While it seems redundant, some developers use llms.txt to provide a “safe” version of their content for those bots they do allow. It allows for a controlled version of the site’s identity to be presented to AI agents without exposing the entire site structure to every crawler.

Myth 3: llms.txt is a replacement for Schema markup.
Schema.org structured data is a globally recognized standard used by Google and Bing. llms.txt is a community proposal. They serve different purposes: Schema provides machine-readable metadata for specific entities (products, reviews, people), while llms.txt provides a narrative summary for LLMs.

Frequently Asked Questions

Why does Ahrefs data show so few requests for llms.txt?

The primary reason is a lack of official adoption. Major AI developers (OpenAI, Google, Meta) have not yet programmed their crawlers to seek out and prioritize the llms.txt filename. Most bots continue to rely on traditional sitemaps and HTML parsing.

Why does Ahrefs data show so few requests for llms.txt?

Should I delete my llms.txt file if it has no requests?

There is no technical reason to delete it. It does not slow down your site or harm your SEO. Keeping it ensures that if the standard is adopted by major AI agents in the future, your site is already optimized for them.

Does llms.txt help with ChatGPT or Perplexity citations?

Currently, there is no evidence that llms.txt directly increases citations. These tools primarily use their own web crawlers to find the most relevant, authoritative content based on the user’s query, regardless of whether a summary file exists.

Is llms.txt a legal way to protect my content from AI training?

No. llms.txt is a guidance file, not a legal document or a technical barrier. To attempt to block AI training, you must use the robots.txt file to disallow specific user-agents (like GPTBot) or implement technical blocks such as CAPTCHAs and paywalls.

What is the best way to make my site “AI-friendly” right now?

Focus on high-quality, structured content. Use clear H1-H4 heading hierarchies, implement comprehensive Schema.org markup, and ensure your content directly answers common user questions. These are the factors that AI crawlers currently use to determine relevance and authority.

The evolution of the web’s relationship with artificial intelligence is still in its early stages. While the 97% failure rate of llms.txt requests suggests the standard is currently ignored, it also highlights the volatility of the current SEO landscape. As AI agents move from simple scraping to more sophisticated understanding, the tools used to communicate with them will continue to shift.

You may also like

Leave a Comment