Skip to main content
Quick summary
Audio Dictation converts spoken conversations into text in real time. Use Partner APIs for full control, the Web SDK for browser apps with Suki dictation components, or the Dictation SDK for a hosted iframe experience with in-field and scratchpad modes.
Audio Dictation is supported by: Ambient Partner APIs, Web SDK, Dictation SDK
Audio Dictation lets providers speak naturally and receive transcribed text in real time. You can use it to capture clinical conversations, dictate notes, or populate fields in your application without manual transcription. The Audio Dictation workflow focuses only on speech-to-text transcription. Unlike ambient documentation workflows, it does not generate structured clinical notes or summaries. This makes it a good fit when you need fast, lightweight transcription that integrates directly into your own workflows and user experience. You can use Audio Dictation in both in-person and virtual care scenarios. Stream audio through the SDKs or APIs, receive intermediate and final transcripts, and manage dictation sessions in your application. The APIs and SDKs give you control over audio capture, session lifecycle, transcript handling, and downstream processing.

Common use cases

Support in-person clinical dictation

Capture provider dictation during in-person clinical encounters and retrieve transcribed text for downstream documentation workflows.

Support telehealth dictation workflows

Capture provider dictation during virtual visits and return transcribed text for use in telehealth or remote care workflows.

Populate fields in your application

Stream dictation into note fields, forms, or scratchpad areas in a web application using Web SDK or Dictation SDK patterns.

Build custom transcription pipelines

Own session lifecycle, audio capture, and transcript handling end to end with Partner REST and WebSocket APIs.

Key features

Real-time Transcription

See text appear as people speak, no waiting for the conversation to end.

Multiple Sessions

Run multiple dictation sessions under one parent session for complex workflows.

WebSocket Streaming

Low-latency audio streaming for fast, responsive dictation.

Clean Transcripts

Automatically formatted with proper punctuation, capitalization, and filler words removed.

Intermediate and Final Texts

Receive both intermediate (partial) transcripts as speech is processed and final transcripts when segments are complete.

How it works

If you use the REST-based Partner APIs, the dictation workflow has three steps:
  1. Create a session - Create a dictation session and receive a transcription_session_id.
  2. Stream audio - Open a WebSocket connection and stream audio data to the session.
  3. Receive transcripts - Receive transcribed text in real time as audio is processed.
Handling multiple audio streams You can connect multiple WebSocket streams to the same dictation session by using the same transcription_session_id. This allows you to:
  • Stream audio from multiple sources
  • Reconnect dropped WebSocket connections
  • Support complex audio workflows
Transcripts from all streams are combined into a single dictation session.

Workflow for Partner APIs

Available integrations

Use Partner APIs to build custom dictation workflows with direct control over audio streaming, session management, and transcript handling.Partner APIs are best for server-side integrations or applications that manage their own UI and workflow orchestration.With Partner APIs, you can:
For WebSocket handshake and wire format details, refer to the Dictation streaming guide.

Usage scenarios

If you’re not sure which integration fits your use case, use the following table to map your scenario to the recommended integration. Best fit indicates the integration we recommend for that scenario.
ScenarioPartner APIsWeb SDKDictation SDK
You create sessions, stream on /ws/transcribe, and handle transcripts in your own codeBest fit
Your backend or non-browser client owns audio capture and storageBest fit
You need multiple WebSocket streams on one transcription_session_idBest fit
Browser app in JavaScript or React with your own page layout and fieldsBest fit
Dictation into a specific input (in-field mode) with a Suki-hosted iframeBest fit
Dictation in a scratchpad panel, not tied to one fieldBest fit

Next steps

Refer to the Basic usage guide to get started with dictation using our Partner APIs.
Last modified on May 29, 2026