All current AI-voice detectors are probabilistic. Treat their output as one signal — even the best report meaningful false-positive rates on heavily-compressed or low-quality real audio, and false-negative rates on lightly-edited AI output. Cross-check against multiple detectors and corroborate with classical forensics and provenance evidence before drawing conclusions.
Watermarks embedded at synthesis time are more reliable than post-hoc detection — but only when the generator co-operates. Major commercial providers (Google, Meta, Microsoft) embed watermarks; open-source generators generally do not.
These tools predate AI synthesis but remain useful baselines. Spectrogram analysis catches splices, frequency-cutoff fingerprints, and codec/recompression artefacts that AI detectors may miss.
AI voice detection & deepfake audio investigation hub
A curated directory of free and commercial tools for verifying audio authenticity in 2026 — AI voice detectors (Resemble, TruthScan, Hive, Pindrop), watermarking systems (Google SynthID, Meta AudioSeal, IPTC Digital Source Type), classical spectrogram forensics (Sonic Visualiser, Audacity, Praat), and Bellingcat-style investigation workflows for the verification step that comes after detection.
For OSINT investigators, journalists, and trust-and-safety teams: voice cloning is a documented red-team and criminal technique — attackers harvest target voice samples from publicly-available video, then synthesize convincing fakes with consumer tools like ElevenLabs and Murf. Detection alone is not sufficient. Combine multiple AI detectors, classical spectrogram inspection, and channel-level corroboration.
Companion to the AI Provenance & C2PA hub which covers AI image, video, and text. The C2PA standard now extends to audio; major commercial AI-audio providers embed C2PA manifests at synthesis time, but most open-source generators do not.
Frequently asked questions
How reliable are AI voice detectors in 2026?
For research-grade demos with clean inputs, vendors claim 95–99% accuracy. In real conditions — phone recordings, room noise, codec artefacts, brief snippets — accuracy drops significantly. The arms race favours the generators (Sora-class audio models, real-time voice conversion) over the detectors. Use detectors as one signal among many, never as standalone proof.
Which detector should I use first?
For free quick checks: Resemble Detect or TruthScan. For deep technical analysis where you want to inspect the spectrogram yourself: Sonic Visualiser or Audacity. For enterprise call-center protection: Pindrop Pulse. For research-grade transparent detection: the open-source AudioSeal from Meta.
How does AudioSeal differ from a regular detector?
AudioSeal is a watermarking system, not a detector — it embeds an imperceptible signal at synthesis time that Meta's code can later identify with sample-level precision. Critically, it can identify which model generated a given clip, useful for attribution. The catch: it only works for audio generated by AudioSeal-enabled pipelines (currently Meta's research models). It doesn't detect audio from generators that don't use it.
What about detecting deepfakes of specific public figures?
No reliable public tool does this in 2026. Speaker-recognition services (Pindrop, Voice Pulse) can verify whether audio matches a stored voiceprint when one exists, but they need a clean reference sample. For public figures the better approach is corroboration: was the speech expected, was it released through known channels, do witnesses confirm it, does environmental audio match the claimed location.
Can I detect cloning in real-time during a phone call?
Yes, increasingly. Pindrop Pulse, Reality Defender, and similar enterprise products process call audio in real-time and flag suspicious calls. Consumer/free real-time detection isn't reliable yet. The current best practice for high-stakes voice calls (CEO impersonation, voice-authorised wire transfers, family-emergency scams): use a pre-arranged code phrase or a callback to a known number, not voice characteristics.
How do I corroborate an audio recording?
Bellingcat's framework: (1) Source vetting — who originally posted it, are they known, what's their track record? (2) Environmental matching — does ambient noise, language, accent fit the claimed location/event? (3) Technical analysis — spectrogram, AI detector results, codec consistency. (4) Cross-corroboration — independent recordings of the same event, witnesses, official statements. AI-detection results are step 3 of 4; never the only step.