Last updated:

AI Dorks — Document & Dataset

Claude prompts for squeezing intelligence out of files you upload — selector extraction, redaction and metadata-leak checks, table extraction, and unknown-string triage.

Public-source, authorized use. These are prompt-engineering aids, not jailbreaks. Use them on subjects and infrastructure you’re authorized to investigate; they keep to public sources, respect site terms, and exclude breach data and private-individual targeting. Paste a prompt into Claude, fill the highlighted fields, and have it show its work and cite sources.

Turn uploaded files into structured intelligence

These prompts use Claude’s document toolchain — pdfplumber, pikepdf, camelot, python-docx, phonenumbers, PyWhat — to extract and validate selectors, surface residual data hidden under redactions or in metadata, pull tables to CSV, and classify unknown strings. Everything runs on files you upload and is saved to the sandbox outputs folder. It works hand-in-hand with Google dorking: Claude can build filetype:pdf and site: queries to find the files, then read and analyse them.

Frequently asked questions

Can Claude find text hidden under a redaction?

It can flag selectable text or layers left behind a redaction box — a common mistake. It does not claim to defeat a properly flattened redaction.

What selectors does it extract?

Emails, URLs, IPv4s, @handles, and phone numbers — phones validated and formatted with the phonenumbers library — deduped into CSVs.

Are my files uploaded anywhere public?

Files go into Claude’s sandbox for processing and outputs are saved there for you; they are not posted publicly by these prompts.

How do I extract metadata from a PDF?

Upload the file and the document prompt reads its metadata with pikepdf and pdfplumber — author, software, creation and modification times, and XMP fields — saving the result so you can spot authorship and timeline leaks.

How do I check a PDF for hidden or improperly removed text?

The redaction-check prompt looks for selectable text or layers left behind a redaction box — a common mistake — and flags them. It does not claim to recover text from a properly flattened redaction.

Is there a Google dork to find PDFs (filetype:pdf)?

Yes — filetype:pdf site:example.com finds published PDFs. The document prompts go further: Claude can build those filetype: dorks and then read what it finds, pulling metadata, redaction leaks and tables. See the Google Dorks list for syntax.