Documentation Index
Fetch the complete documentation index at: https://developer.suki.ai/llms.txt
Use this file to discover all available pages before exploring further.
This guide covers: Ambient dictation REST and WebSocket APIs.
- For a browser app, use the Web SDK for audio dictation or the Dictation SDK Beta instead of building this REST and WebSocket flow yourself.
- For wire format, handshake rules, and troubleshooting, refer to the Dictation streaming guide.
- Create a dictation session
- Stream audio over WebSocket
- Receive transcript updates
- End the session when dictation is complete
sdp_suki_token. Refer to Provider authentication and Partner authentication for more information.
How the dictation workflow works
The dictation workflow uses both REST APIs and a WebSocket connection.- Authenticate and create a session with the REST API.
- Stream audio to the session over WebSocket and process transcript updates.
- End the session.
Create a dictation session
Create a parent transcription session before you open the WebSocket connection. The response includes atranscription_session_id, which identifies the session across the workflow.
Use the transcription_session_id when you:
- Connect to
/ws/transcribe - End the dictation session with End dictation session
201 Created when the session is created successfully.
Request details
- Include
sdp_suki_tokenin every REST request and during the WebSocket handshake. transcription_session_idis optional in the create request. If you omit it, Suki generates one.audio_configis optional.audio_encodingmust beLINEAR16.
Stream audio over WebSocket
After you create the session, connect to GETwss://sdp.suki-stage.com/ws/transcribe when the session is READY or IDLE.
If the session is already RUNNING, COMPLETED, or in another state, the WebSocket handshake fails with FailedPrecondition.
For push-to-talk workflows that use multiple speech sessions with the same transcription_session_id, refer to Dictation streaming for more information.
Authentication
For non-browser clients, send these headers during the upgrade request:sdp_suki_tokentranscription_session_id
Sec-WebSocket-Protocol:
Send audio messages
Send audio data as JSON text frames. Example audio message:Streaming requirements
- Send one JSON object per WebSocket
send. - Do not send raw binary frames.
- Use
audioDatafor dictation audio payloads. - Base64-encode PCM_S16LE audio bytes in
audioData. - Do not use ambient streaming fields such as
dataorRU9G.
Receive transcript events
Parse transcript messages fromevent.data in your WebSocket onmessage handler. Refer to Stream audio to dictation session for Python and TypeScript code examples, and Dictation streaming for partial and final frames, EOF, and filtering rules.
Example partial frame:
is_final: false: partial (interim) text that may change in later messages.is_final: true: final text for that segment; the recognizer will not revise it.- After the speech stream ends, the server sends
{ "transcript": { "transcript": "EOF" } }. Treat that as end-of-results for that WebSocket; the connection closes shortly after.
Do not dedupe messages by
transcript_id. The server assigns a new ID per frame, including partials.End the dictation session
When dictation is complete:- Send
AUDIO_END - Close the WebSocket connection
- End the session with the REST API
200 OK on success.
Possible status values include:
completedcancelledfailed
final_transcript as the complete transcript for the session.
Common integration patterns
Pattern 1: Standard Ambient dictation flow
A typical Ambient dictation workflow follows these steps:Create the session
Call Create dictation session and save the
transcription_session_id.Stream audio
Open a WebSocket connection to
/ws/transcribe when the session is READY or IDLE. Send AUDIO messages, handle partial and final inbound frames, and finish with AUDIO_END.Process transcripts
Update the UI with partial and final transcript updates from incoming WebSocket messages.
End the session
Call End dictation session and store the final transcript output.
Pattern 2: Push-to-talk or reconnect
For push-to-talk, open/ws/transcribe when the session is READY or IDLE, stream one utterance, send AUDIO_END, wait for EOF, then close the WebSocket. When the session is READY or IDLE again, open a new WebSocket on the same transcription_session_id for the next utterance.
If a connection drops mid-utterance, open a new WebSocket with the same transcription_session_id and sdp_suki_token only when the session accepts a new speech session. End the parent session with REST only when the full dictation workflow is complete.
Pattern 3: Configure audio settings
Passaudio_config in the create request when you need to specify:
- Audio encoding
- Language
- Sample rate
Pattern 4: Stream audio in chunks
For lower latency, send smallerAUDIO messages as audio becomes available instead of buffering the entire recording.
After the final audio chunk, send AUDIO_END.
What you can build
Live Dictation Experiences
Display partial and final transcript updates in real time while the provider speaks.
Server-side Transcription Pipelines
Capture audio on a backend service, stream it through
/ws/transcribe, and store the final transcript output.Telehealth and In-person Workflows
Add transcription workflows to virtual or in-person clinical experiences without ambient note generation.
EHR and Form Integrations
Send transcript output into forms, notes, or downstream clinical systems after the session ends.
Related API references
Create Dictation Session
Create a transcription session
Stream Audio for Dictation
Connect and stream audio over WebSocket
End Dictation Session
End a transcription session
Best practices
FAQs
What's the difference between dictation and ambient clinical documentation?
What's the difference between dictation and ambient clinical documentation?
| Feature | Description |
|---|---|
| Dictation | Converts speech into transcript text with formatting such as punctuation and capitalization. |
| Ambient clinical documentation | Transcribes conversations and generates structured clinical documentation outputs. |
How is /ws/transcribe different from ambient /ws/stream?
How is /ws/transcribe different from ambient /ws/stream?
Dictation uses
/ws/transcribe with audioData payloads and AUDIO_END events.Ambient clinical documentation uses /ws/stream with different authentication flows and message formats.For more information, refer to Audio streaming and Dictation streaming.