Audio search, for your AI.
Trusted by teams building the next generation of audio-aware AI
The internet produces tens of millions of hours of audio every week — podcasts, radio broadcasts, congressional hearings, earnings calls, interviews, and court proceedings. Until now, that signal was invisible to AI agents.
Sonar indexes, transcribes, and semantically embeds public audio at scale. Your agent queries it the same way it would query a web search API — and gets back ranked, timestamped, speaker-attributed results in milliseconds.
Measured across transcript accuracy, semantic relevance, speaker diarization, and end-to-end retrieval latency against all available baselines.
View full benchmarks →Four core capabilities, one API surface. No stitching together pipelines.
Query by meaning, not keywords. Sonar finds the clips where your concept is discussed, even when the exact words aren't used.
Results include identified speakers with confidence scores. Know who said it, not just what was said.
Every result links directly to the exact moment in the source audio. Agents can cite the precise clip, not just the episode.
Breaking broadcasts, live hearings, and real-time podcasts are indexed within minutes of airing. Agents stay current.
Real-world use cases, each powered by a handful of API calls. If your agent needs to hear the world, Sonar is the layer underneath.
Query across millions of indexed recordings in natural language. Sonar returns the most relevant clips with full transcripts, source metadata, and relevance scores — no keyword gymnastics required.
✓ Production-ready"We are holding the policy rate steady while the committee evaluates whether monetary policy is restrictive enough to bring inflation down sustainably."
"The chair was clear that rate cuts depend on confidence in the inflation path, not just one softer month of economic data."
"Powell said the Fed can move carefully, but he pushed back on the idea that easier monetary policy is already guaranteed this year."
{
"query": "Powell monetary policy and rate cuts",
"utterance_text": "We are holding the policy rate steady while the committee evaluates whether monetary policy is restrictive enough.",
"audio_files": [
{ "id": "aud_9f42c1", "source": "federalreserve.gov", "duration": "58m" },
{ "id": "aud_61f3a8", "source": "brookings.edu", "duration": "74m" }
],
"speakers": ["Jerome Powell", "Lael Brainard", "Tom Keene"],
"clips": [
{ "file_id": "aud_9f42c1", "speaker": "Jerome Powell", "t": "14:22", "score": 0.97 }
]
}
Submit any public audio URL and get back a fully speaker-diarised, timestamped transcript in seconds. Ideal for agents that need to reason over specific recordings not yet in the Sonar index.
◎ In beta{ "type": "transcript.segment.created", "audio_url": "podcasts.apple.com/.../ai-a16z", "entry": { "speaker_name": "Martin", "timestamp": "00:04.120", "confidence": 0.97, "verbatim_utterance": "AI apps are becoming systems, not just prompts." }, "format": "json", "word_timestamps": true}
Subscribe to live audio streams and receive transcript chunks as they're spoken. Agents can react to breaking news, live hearings, or earnings calls the moment words are said — not hours later.
⟡ Early access{
"event": "senate hearing",
"time": "12:04:08.240",
"text": "Mr. President, I rise today because our nation stands at a critical crossroads. The choices we make in this chamber will echo for generations. Every day, working families are asking whether we can still solve problems together. We came here to deliver real, tangible progress."
}
{
"event": "policy_watch.triggered",
"matched_segment": "floor vote later today",
"speaker": "Sen. Cantwell",
"timestamp": "12:10",
"latency_ms": 840
}
Sonar runs a continuous pipeline that crawls, processes, and indexes public audio across the web — so by the time your agent makes a query, the work is already done.
Sonar continuously discovers public audio across podcast feeds, broadcast archives, government streams, and radio APIs.
Every recording is transcribed with word-level timestamps and speaker separation using a proprietary multi-model stack.
Transcript chunks are embedded into Sonar's audio-native semantic index, optimised for spoken-word retrieval patterns.
Queries return ranked clips with metadata in under 200ms. Your agent gets structured JSON — not raw audio to figure out itself.
Every result includes source URL, timestamp, speaker, and a clip playback link — so your agent's outputs are fully verifiable.
Spoken language is disfluent, non-linear, and speaker-dependent. We explain the architecture choices that make semantic audio retrieval work where BM25 breaks down.
The biggest release yet. Real-time broadcast monitoring, improved speaker diarisation accuracy, and a new streaming endpoint that fires every three seconds.
We're releasing the evaluation suite we use internally. 4,200 queries across 18 audio domains. Every API provider can now be compared on the same standard.
Sonar is building new interfaces, infrastructure, and business models for AIs to work with the spoken web.