📦 Wayback Archive Engine

Preserve entire sites, individual pages, or bulk URL lists via the Wayback Machine Save Page Now API. Build forensic baselines before pentests or archive evidence before it disappears.

Proxied via Cloudflare Worker with automatic failover · Last updated February 10, 2026

Archive a Single Page
Bulk Archive — Paste URL List
0
Total
0
Saved
0
Pending
0
Failed
Crawl Site via Wayback CDX → Archive All Found URLs

First discovers all known URLs for a domain from the Wayback CDX index, then feeds them into Save Page Now to create fresh snapshots. Creates a full forensic baseline.

Step 1: CDX discovery → Step 2: Archive each URL
0
Discovered
0
Archived
0
Pending
0
Failed

Why do OSINT investigators need to archive web pages?

Web content is ephemeral — pages are modified, taken offline, or deliberately scrubbed during investigations. Max Intel's Archive Engine uses the Internet Archive's Save Page Now API to create timestamped, immutable snapshots stored at web.archive.org. According to the SANS Institute Digital Forensics and Incident Response (DFIR) framework, preserving volatile digital evidence in its original state is a foundational step in any investigation. The Wayback Machine provides a neutral, third-party record that courts and compliance teams recognize as credible documentation of what a website displayed at a specific point in time.

What are the three archiving modes?

Single URL mode submits one page for immediate archiving — useful for capturing a specific piece of evidence such as a social media post, product listing, or terms of service page. Bulk Archive mode accepts a list of URLs (one per line) and processes them with configurable concurrency and rate limiting to avoid overwhelming the Internet Archive API. Site Crawl + Archive first queries the Wayback Machine CDX index to discover all known URLs for a domain, then feeds each discovered endpoint into Save Page Now to create fresh snapshots. This mode creates a comprehensive forensic baseline of an entire website's current state, which the OWASP Web Security Testing Guide recommends before beginning any authorized security assessment.

How does the Cloudflare Worker proxy handle CORS restrictions?

Browsers enforce Cross-Origin Resource Sharing (CORS) policies that prevent direct JavaScript calls to the Internet Archive API from a different domain. Max Intel routes requests through a Cloudflare Worker that adds the necessary CORS headers, with automatic failover to three backup proxies (AllOrigins, CORSProxy.io, CodeTabs) if the primary worker is unavailable. The proxy chain tries direct fetch first (for local development), then cycles through proxies until one succeeds — ensuring the tool works reliably regardless of network conditions.

Save Page Now (SPN) API
The Internet Archive's public endpoint for submitting web pages to be crawled and archived by the Wayback Machine. Returns the archive URL and HTTP status for each submission, with rate limits that vary by request volume.
CDX Index API
A programmatic interface to the Wayback Machine's URL index, returning all archived URLs for a domain with timestamps, status codes, and MIME types. Used in Crawl + Archive mode to discover endpoints before archiving them.
Forensic Baseline
A timestamped record of a website's complete state at a specific point in time, created before a security assessment or investigation begins. Provides a reference point for detecting changes, proving content existed, or documenting the scope of an engagement.
Evidence Preservation
The practice of capturing and storing digital content in its current state before it can be modified or removed. Critical in legal proceedings, intellectual property disputes, threat intelligence, and investigative journalism where web content may be deliberately altered or deleted.

Wayback Archive Engine — Frequently Asked Questions

What is the Wayback Machine Save Page Now API?

The Save Page Now (SPN) API allows anyone to request that the Internet Archive capture and preserve a snapshot of a web page. This tool automates that process for single URLs, bulk lists, or entire domains discovered through the CDX index, creating timestamped forensic records.

Why archive web pages for OSINT?

Web content can be modified or deleted at any time. Archiving creates timestamped evidence that content existed in a specific state at a specific time. This is critical for legal proceedings, threat intelligence, investigative journalism, and security assessments where proving prior content is essential.

What is CDX discovery and site crawling?

CDX (Capture/Digital Index) is the Wayback Machine's index of all URLs it has ever archived for a domain. The Site Crawl feature queries this index to discover every known URL, then systematically archives each one — ensuring comprehensive preservation without needing a sitemap.

Are there rate limits on the Save Page Now API?

Yes, the Internet Archive rate-limits SPN requests. This tool includes configurable delays between requests and automatic retry logic with exponential backoff. For bulk operations, a delay of 10-15 seconds between requests is recommended to avoid throttling.