Quick summary
Audio Dictation converts spoken conversations into text in real time. Use Partner APIs for full control, the Web SDK for browser apps with Suki dictation components, or the Dictation SDK for a hosted iframe experience with in-field and scratchpad modes.
Audio Dictation is supported by: Ambient Partner APIs, Web SDK, Dictation SDK
- For standalone transcription with your own UI and server logic, use the REST-based Dictation APIs and follow Basic usage.
- For browser apps with Suki Web SDK packages, use the Web SDK for audio dictation.
- To embed dictation through a Suki-hosted iframe, use the Dictation SDK Beta.
Common use cases
Support in-person clinical dictation
Capture provider dictation during in-person clinical encounters and retrieve transcribed text for downstream documentation workflows.
Support telehealth dictation workflows
Capture provider dictation during virtual visits and return transcribed text for use in telehealth or remote care workflows.
Populate fields in your application
Stream dictation into note fields, forms, or scratchpad areas in a web application using Web SDK or Dictation SDK patterns.
Build custom transcription pipelines
Own session lifecycle, audio capture, and transcript handling end to end with Partner REST and WebSocket APIs.
Key features
Real-time Transcription
See text appear as people speak, no waiting for the conversation to end.
Multiple Sessions
Run multiple dictation sessions under one parent session for complex workflows.
WebSocket Streaming
Low-latency audio streaming for fast, responsive dictation.
Clean Transcripts
Automatically formatted with proper punctuation, capitalization, and filler words removed.
Intermediate and Final Texts
Receive both intermediate (partial) transcripts as speech is processed and final transcripts when segments are complete.
How it works
If you use the REST-based Partner APIs, the dictation workflow has three steps:- Create a session - Create a dictation session and receive a transcription_session_id.
- Stream audio - Open a WebSocket connection and stream audio data to the session.
- Receive transcripts - Receive transcribed text in real time as audio is processed.
- Stream audio from multiple sources
- Reconnect dropped WebSocket connections
- Support complex audio workflows
Transcripts from all streams are combined into a single dictation session.
Workflow for Partner APIs
Available integrations
- Partner APIs
- Web SDK
- Dictation SDK
Use Partner APIs to build custom dictation workflows with direct control over audio streaming, session management, and transcript handling.Partner APIs are best for server-side integrations or applications that manage their own UI and workflow orchestration.With Partner APIs, you can:
- Create Audio Dictation sessions
- Stream audio through WebSocket connections
- End Audio Dictation sessions
For WebSocket handshake and wire format details, refer to the Dictation streaming guide.
Usage scenarios
If you’re not sure which integration fits your use case, use the following table to map your scenario to the recommended integration. Best fit indicates the integration we recommend for that scenario.| Scenario | Partner APIs | Web SDK | Dictation SDK |
|---|---|---|---|
You create sessions, stream on /ws/transcribe, and handle transcripts in your own code | Best fit | ||
| Your backend or non-browser client owns audio capture and storage | Best fit | ||
You need multiple WebSocket streams on one transcription_session_id | Best fit | ||
| Browser app in JavaScript or React with your own page layout and fields | Best fit | ||
| Dictation into a specific input (in-field mode) with a Suki-hosted iframe | Best fit | ||
| Dictation in a scratchpad panel, not tied to one field | Best fit |