- Home
- AI Dorking
- Documents & Data
Last updated:
AI Dorks — Document & Dataset
Claude prompts for squeezing intelligence out of files you upload — selector extraction, redaction and metadata-leak checks, table extraction, and unknown-string triage.
Turn uploaded files into structured intelligence
These prompts use Claude’s document toolchain — pdfplumber, pikepdf, camelot, python-docx, phonenumbers, PyWhat — to extract and validate selectors, surface residual data hidden under redactions or in metadata, pull tables to CSV, and classify unknown strings. Everything runs on files you upload and is saved to the sandbox outputs folder. It works hand-in-hand with Google dorking: Claude can build filetype:pdf and site: queries to find the files, then read and analyse them.
Frequently asked questions
Can Claude find text hidden under a redaction?
It can flag selectable text or layers left behind a redaction box — a common mistake. It does not claim to defeat a properly flattened redaction.
What selectors does it extract?
Emails, URLs, IPv4s, @handles, and phone numbers — phones validated and formatted with the phonenumbers library — deduped into CSVs.
Are my files uploaded anywhere public?
Files go into Claude’s sandbox for processing and outputs are saved there for you; they are not posted publicly by these prompts.
How do I extract metadata from a PDF?
Upload the file and the document prompt reads its metadata with pikepdf and pdfplumber — author, software, creation and modification times, and XMP fields — saving the result so you can spot authorship and timeline leaks.
How do I check a PDF for hidden or improperly removed text?
The redaction-check prompt looks for selectable text or layers left behind a redaction box — a common mistake — and flags them. It does not claim to recover text from a properly flattened redaction.
Is there a Google dork to find PDFs (filetype:pdf)?
Yes — filetype:pdf site:example.com finds published PDFs. The document prompts go further: Claude can build those filetype: dorks and then read what it finds, pulling metadata, redaction leaks and tables. See the Google Dorks list for syntax.