Why do OSINT investigators need to archive web pages?
Web content is ephemeral — pages are modified, taken offline, or deliberately scrubbed during investigations. Max Intel's Archive Engine uses the Internet Archive's Save Page Now API to create timestamped, immutable snapshots stored at web.archive.org. According to the SANS Institute Digital Forensics and Incident Response (DFIR) framework, preserving volatile digital evidence in its original state is a foundational step in any investigation. The Wayback Machine provides a neutral, third-party record that courts and compliance teams recognize as credible documentation of what a website displayed at a specific point in time.
What are the three archiving modes?
Single URL mode submits one page for immediate archiving — useful for capturing a specific piece of evidence such as a social media post, product listing, or terms of service page. Bulk Archive mode accepts a list of URLs (one per line) and processes them with configurable concurrency and rate limiting to avoid overwhelming the Internet Archive API. Site Crawl + Archive first queries the Wayback Machine CDX index to discover all known URLs for a domain, then feeds each discovered endpoint into Save Page Now to create fresh snapshots. This mode creates a comprehensive forensic baseline of an entire website's current state, which the OWASP Web Security Testing Guide recommends before beginning any authorized security assessment.
How does the Cloudflare Worker proxy handle CORS restrictions?
Browsers enforce Cross-Origin Resource Sharing (CORS) policies that prevent direct JavaScript calls to the Internet Archive API from a different domain. Max Intel routes requests through a Cloudflare Worker that adds the necessary CORS headers, with automatic failover to three backup proxies (AllOrigins, CORSProxy.io, CodeTabs) if the primary worker is unavailable. The proxy chain tries direct fetch first (for local development), then cycles through proxies until one succeeds — ensuring the tool works reliably regardless of network conditions.
- Save Page Now (SPN) API
- The Internet Archive's public endpoint for submitting web pages to be crawled and archived by the Wayback Machine. Returns the archive URL and HTTP status for each submission, with rate limits that vary by request volume.
- CDX Index API
- A programmatic interface to the Wayback Machine's URL index, returning all archived URLs for a domain with timestamps, status codes, and MIME types. Used in Crawl + Archive mode to discover endpoints before archiving them.
- Forensic Baseline
- A timestamped record of a website's complete state at a specific point in time, created before a security assessment or investigation begins. Provides a reference point for detecting changes, proving content existed, or documenting the scope of an engagement.
- Evidence Preservation
- The practice of capturing and storing digital content in its current state before it can be modified or removed. Critical in legal proceedings, intellectual property disputes, threat intelligence, and investigative journalism where web content may be deliberately altered or deleted.