Deepgram Integration for AI Agents

Deepgram

Use Cases

Voice-first support scenarios

Real examples of how businesses use Deepgram-powered AI agents to convert spoken customer interactions into actionable data and automated responses.

After-Hours Voicemail Transcription and Triage

A customer leaves a voicemail at midnight about a billing dispute. Your AI Agent picks up the recording, sends it through Deepgram's Speech-to-Text API with smart formatting and diarization, and produces a clean transcript. The agent then categorizes the issue, creates a support ticket with the full transcript attached, and prioritizes it for the morning shift. Your team starts the day with organized, actionable voicemail data instead of a cluttered inbox.

Multilingual Call Summaries for Global Teams

A support center handles calls in Spanish, French, and English. After each call, the AI Agent sends the recording to Deepgram's Summarize Audio endpoint with automatic language detection. The agent extracts a concise summary of the customer's issue and resolution, then logs it in your CRM. Managers across regions review standardized summaries without listening to full recordings, saving hours of manual review every week.

Voice Responses for Accessibility-First Experiences

A visually impaired customer interacts with your website chat. Your AI Agent composes the answer and converts it to audio using Deepgram's Text-to-Speech API with the Aura voice model. The customer hears a natural spoken response instead of reading text. Your business delivers inclusive, accessible support that serves every customer, regardless of how they prefer to consume information.

Try

Deepgram

FAQs

Frequently Asked Questions

Which Deepgram speech models does the AI agent use for transcription?

The agent defaults to Deepgram's Nova-3 model, which delivers a 53% lower word error rate than competing services. You can also configure it to use specialized models like the phonecall model for call center audio or the general model for broader use cases. The model parameter is adjustable per request.

Can the agent transcribe audio with multiple speakers and label who said what?

Yes. The agent enables Deepgram's diarize parameter when processing multi-speaker recordings. Each word in the transcript gets assigned a speaker label, so your team can see exactly who said what. This works with all Deepgram ASR models including Nova and is included at no extra charge.

What audio formats and languages does the transcription support?

Deepgram supports WAV, MP3, FLAC, OGG, and most common audio formats. The content_type parameter specifies the MIME type. Language support covers 30+ languages including English, Spanish, French, German, Portuguese, and Mandarin. Enable detect_language for automatic language identification.

How does Deepgram's Text-to-Speech work within conversations?

The agent calls Deepgram's TTS REST API with the text content, selecting from available Aura voice models. You can customize pitch (0.5 to 2.0), speed (0.25 to 4.0), output format (MP3, WAV), and sample rate. The audio file is generated and delivered to the customer within the conversation thread.

Does Tars store the audio files or transcripts from Deepgram?

Tars does not store your audio files. Audio URLs are sent to Deepgram for processing, and the resulting transcripts or summaries are used within the active conversation. Deepgram processes audio in-memory by default. You control data retention policies through your Deepgram project settings.

What is the topic detection feature and how does it categorize conversations?

Topic Detection transcribes the audio and then identifies the main subjects discussed using Deepgram's audio intelligence layer. The agent receives a list of detected topics with confidence scores. This enables automatic categorization of customer calls, such as tagging billing inquiries, technical issues, or sales conversations without manual review.

How accurate is the transcription for noisy call center recordings?

Deepgram's Nova-3 model is specifically optimized for noisy, real-world audio. You can boost accuracy further using the keywords parameter to prioritize industry-specific terms like product names or technical jargon. The enhanced tier option provides additional accuracy for challenging audio quality.

Can I track how many transcription minutes my AI agent consumes?

Yes. The agent can query Deepgram's Usage Summary API to retrieve minutes submitted, processed, and billed for any time range. Filter by model, accessor, or tag to break down usage across different agent workflows. This helps you monitor costs and optimize which audio gets processed.

Voice AI that listens, transcribes, and speaks for your business

Speech intelligence at conversation speed

Transcribe Audio Files

Summarize Conversations

Generate Spoken Responses

Detect Audio Topics

List Available Models

Monitor Project Usage