Audio Dictation

Quick summary

Audio Dictation converts spoken conversations into text in real time. Use Partner APIs for full control, the Web SDK for browser apps with Suki dictation components, or the Dictation SDK for a hosted iframe experience with in-field and scratchpad modes.

Last updated:May 2026

Audio Dictation is supported by: Ambient Partner APIs, Web SDK, Dictation SDK

For standalone transcription with your own UI and server logic, use the REST-based Dictation APIs and follow Basic usage.
For browser apps with Suki Web SDK packages, use the Web SDK for audio dictation.
To embed dictation through a Suki-hosted iframe, use the Dictation SDK Beta.

Audio Dictation lets providers speak naturally and receive transcribed text in real time. You can use it to capture clinical conversations, dictate notes, or populate fields in your application without manual transcription. The Audio Dictation workflow focuses only on speech-to-text transcription. Unlike ambient documentation workflows, it does not generate structured clinical notes or summaries. This makes it a good fit when you need fast, lightweight transcription that integrates directly into your own workflows and user experience. You can use Audio Dictation in both in-person and virtual care scenarios. Stream audio through the SDKs or APIs, receive intermediate and final transcripts, and manage dictation sessions in your application. The APIs and SDKs give you control over audio capture, session lifecycle, transcript handling, and downstream processing.

Common use cases

Support in-person clinical dictation

Capture provider dictation during in-person clinical encounters and retrieve transcribed text for downstream documentation workflows.

Support telehealth dictation workflows

Capture provider dictation during virtual visits and return transcribed text for use in telehealth or remote care workflows.

Populate fields in your application

Stream dictation into note fields, forms, or scratchpad areas in a web application using Web SDK or Dictation SDK patterns.

Build custom transcription pipelines

Own session lifecycle, audio capture, and transcript handling end to end with Partner REST and WebSocket APIs.

Key features

Real-time Transcription

See text appear as people speak, no waiting for the conversation to end.

Multiple Sessions

Run multiple dictation sessions under one parent session for complex workflows.

WebSocket Streaming

Low-latency audio streaming for fast, responsive dictation.

Clean Transcripts

Automatically formatted with proper punctuation, capitalization, and filler words removed.

Intermediate and Final Texts

Receive both intermediate (partial) transcripts as speech is processed and final transcripts when segments are complete.

How it works

If you use the REST-based Partner APIs, the dictation workflow has three steps:

Create a session - Create a dictation session and receive a transcription_session_id.
Stream audio - Open a WebSocket connection and stream audio data to the session.
Receive transcripts - Receive transcribed text in real time as audio is processed.

Handling multiple audio streams You can connect multiple WebSocket streams to the same dictation session by using the same transcription_session_id. This allows you to:

Stream audio from multiple sources
Reconnect dropped WebSocket connections
Support complex audio workflows

Transcripts from all streams are combined into a single dictation session.

Workflow for Partner APIs

Available integrations

Partner APIs
Web SDK
Dictation SDK

Use Partner APIs to build custom dictation workflows with direct control over audio streaming, session management, and transcript handling.Partner APIs are best for server-side integrations or applications that manage their own UI and workflow orchestration.With Partner APIs, you can:

For WebSocket handshake and wire format details, refer to the Dictation streaming guide.

Use the Web SDK to add dictation to browser-based JavaScript or React applications.The Web SDK provides prebuilt dictation components and shared authentication through @suki-sdk/core. You can launch dictation with dictationClient.show({ ... }) or render the React Dictation component.To get started, refer to the Web SDK for Audio Dictation guide, then follow the JavaScript or React integration guides.

Usage scenarios

If you’re not sure which integration fits your use case, use the following table to map your scenario to the recommended integration. Best fit indicates the integration we recommend for that scenario.

Scenario	Partner APIs	Web SDK	Dictation SDK
You create sessions, stream on `/ws/transcribe`, and handle transcripts in your own code	Best fit
Your backend or non-browser client owns audio capture and storage	Best fit
You need multiple WebSocket streams on one `transcription_session_id`	Best fit
Browser app in JavaScript or React with your own page layout and fields		Best fit
Dictation into a specific input (in-field mode) with a Suki-hosted iframe			Best fit
Dictation in a scratchpad panel, not tied to one field			Best fit

Next steps

Refer to the Basic usage guide to get started with dictation using our Partner APIs.

​Common use cases

Support in-person clinical dictation

Support telehealth dictation workflows

Populate fields in your application

Build custom transcription pipelines

​Key features

Real-time Transcription

Multiple Sessions

WebSocket Streaming

Clean Transcripts

Intermediate and Final Texts

​How it works

​Workflow for Partner APIs

​Available integrations

​Usage scenarios

​Next steps

Common use cases

Key features

How it works

Workflow for Partner APIs

Available integrations

Usage scenarios

Next steps