Sitemap URL Extraction and Filtering

This workflow automates the process of fetching an XML sitemap, parsing its contents, and filtering the URLs based on custom criteria. It's ideal for SEO specialists, content managers, or developers needing to quickly identify specific content types within a website.

Key Features

Automatically fetches sitemap.xml from any specified URL.
Transforms XML sitemap data into an easily parsable JSON format.
Efficiently extracts all individual URLs from the sitemap for detailed analysis.
Customizable filtering logic to identify URLs matching specific patterns (e.g., .pdf files, specific subdirectories).
Provides a structured output of filtered URLs for further processing or reporting.

How It Works

1. The workflow starts with a manual trigger, allowing you to run it on demand.

2. A 'Set sitemap URL' node defines the target sitemap.xml URL, which you can easily configure.

3. An 'HTTP Request' node fetches the XML content from the specified sitemap URL.

4. The 'Convert Sitemap to JSON' (XML) node transforms the fetched XML data into a structured JSON object for easier manipulation.

5. The 'Split Out' node then separates each 'url' entry from the 'urlset' array into individual items, preparing them for filtering.

6. Finally, a 'Filter URLs' node applies a user-defined condition (by default, checking if the URL ends with '.pdf') to return only the desired URLs.

Automate Sitemap Reading & Filter URLs for Targeted Content

Documentation

Sitemap URL Extraction and Filtering

Key Features

How It Works

Workflow Details

Frequently Asked Questions