Automate AI-Ready llms.txt Generation from Screaming Frog Crawls
Automatically transform raw crawl data into structured `llms.txt` files, reducing manual data preparation time by over 80% and enhancing AI content discovery.
Manually extracting and structuring website content for LLM training or content discovery is a complex and time-intensive task. This workflow automates the generation of `llms.txt` files from Screaming Frog crawls, ensuring optimal content indexing for AI models and saving significant manual effort.

Documentation
Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls
This n8n workflow streamlines the process of creating llms.txt files, essential for guiding Large Language Models (LLMs) to valuable content on your website. By leveraging data from Screaming Frog SEO Spider exports, it ensures your AI models efficiently discover and prioritize high-quality web pages for training or content generation.
Key Features
- Automated llms.txt generation from Screaming Frog internal_html.csv or internal_all.csv exports.
- Intelligent filtering of URLs based on status, indexability, and content type to include only relevant, high-quality pages.
- Multi-language compatibility for Screaming Frog exports (English, French, Italian, German, Spanish).
- Optional AI-powered text classification (using OpenAI/LangChain) for advanced content filtering, ensuring LLMs focus on the most valuable pages.
- Customizable llms.txt output format, including website name, description, URL, title, and meta description.
How It Works
The workflow begins with a simple form where users upload their Screaming Frog CSV export and provide basic website details. It then extracts key data fields (URL, title, description, status, indexability, content type, word count) from the CSV, intelligently handling multi-language column names. A robust filter selects only indexable, HTTP 200 HTML pages. Optionally, an AI text classifier can further refine this selection, categorizing pages as 'useful_content' or 'other_content'. Finally, it compiles the filtered data into the specified llms.txt format, ready for download or automated upload to cloud storage.