Skip to main content
Use the dictation APIs when you need speech-to-text without the full ambient clinical note flow. You create a transcription session, open a WebSocket to stream audio, then end the session when capture is finished so resources can close cleanly.

Usage scenarios

  • You need speech-to-text without the full ambient clinical note flow.
  • You want to create a transcription session, open a WebSocket to stream audio, then end the session when capture is finished so resources can close cleanly.

How it works

The dictation APIs work in four steps:
1

Authenticate

Register the provider and obtain an sdp_suki_token. Refer to Provider authentication and Partner authentication.
2

Create a transcription session

Call Create dictation session and save transcription_session_id.
3

Open a WebSocket to stream audio

Connect to Stream audio to dictation session when the session is READY or IDLE. Refer to Dictation streaming for outbound audio messages and inbound transcript frames.
4

End the session when capture is finished

Call End dictation session when capture is finished.

Endpoints

Last modified on May 22, 2026