Audio Streaming Overview

After you create an ambient session and seed the session context, you can stream audio to the session using the GET /ws/stream endpoint for ambient sessions or the GET /ws/transcribe endpoint for dictation sessions. Both endpoints use WebSocket to stream audio.

This guide applies to: Direct HTTP and WebSocket integrations with Suki for Partner Ambient, Form filling, and Dictation APIs.

Endpoint	Purpose	API reference
`GET` ws/stream	Stream PCM audio into an ambient and a Form filling session for note generation and related processing	Audio streaming for ambient and Form filling sessions
`GET` /ws/transcribe	Stream PCM_S16LE audio into a dictation (transcription) session for real-time text	Stream audio to dictation session

Use the same base host for REST and WebSocket calls in a given environment (for example staging https://sdp.suki-stage.com and wss://sdp.suki-stage.com). Your partnership team confirms which host and credentials apply.

Ambient Streaming

Stream PCM audio into an ambient session for note generation and related processing. Use this endpoint when you already have an ambient session and want to push live audio for that session.

Dictation Streaming

Stream PCM_S16LE audio into a dictation (transcription) session for real-time text. Use this endpoint when you have a dictation session and want to push audio for that session.

Do not send raw audio as binary WebSocket frames on /ws/stream. On /ws/transcribe, outbound audio is also sent as JSON text frames with Base64 payloads, not raw binary PCM frames.

Side-by-side comparison

Topic	Ambient Streaming	Dictation Streaming
Endpoint	`GET /ws/stream`	`GET /ws/transcribe`
Browser `Sec-WebSocket-Protocol`	`SukiAmbientAuth,<ambient_session_id>,<sdp_suki_token>`	`SukiAmbientAuth,<sdp_suki_token>,<transcription_session_id>`
Non-browser headers	`sdp_suki_token`, `ambient_session_id`	`sdp_suki_token`, `transcription_session_id`
Audio field name	`data` (Base64 PCM)	`audioData` (Base64 PCM_S16LE)
Start-of-stream	`START_TIME` required first	No `START_TIME` in the dictation contract documented here
End-of-stream	`AUDIO` with `data`: `RU9G` (Base64 of bytes `EOF`)	`EVENT` with `event`: `AUDIO_END`
Control messages	`type`: `EVENT`, `event`: enum (see FAQ)	`EVENT` / `AUDIO_END` for end of audio

Next steps

Learn more about the Audio Streaming API reference Learn more about the Dictation Streaming API reference Learn more about the Audio Capture & Streaming FAQs

Ambient Streaming

Dictation Streaming

​Side-by-side comparison

​Next steps

Side-by-side comparison

Next steps