Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.suki.ai/llms.txt

Use this file to discover all available pages before exploring further.

Use this guide when you already have an Ambient session and need to stream live audio to GET /ws/stream. The WebSocket is only for sending live audio and control messages for the active stream. After streaming, use the Ambient REST APIs to end the session, poll status, and retrieve transcripts, notes, and structured data. For endpoint details and code examples, refer to the Audio streaming API.

How Ambient streaming works

Suki for Partners Ambient streaming works as follows:
  1. Create an Ambient session
  2. Open a WebSocket connection to GET /ws/stream.
  3. Send one START_TIME message for the stream segment.
  4. Send one JSON message per audio chunk.
  5. Optionally send EVENT messages, such as PAUSE, RESUME, or KEEP_ALIVE, when control is needed.
  6. Send the ambient end marker as the final AUDIO message.
  7. Close the socket, then use End Ambient session to end the session and retrieve results.
Ambient streaming uses JSON text frames. Do not send raw binary audio frames to this endpoint.

Send JSON text frames

Every message you send on /ws/stream must be a UTF-8 JSON text frame.
  • Each WebSocket frame must contain exactly one JSON object.
  • Each client send should contain one logical message.
  • Audio bytes go inside a JSON string field, not in a binary WebSocket frame.
Do not:
  • Send binary WebSocket frames.
  • Send multiple JSON objects in one frame.
  • Stream raw audio through HTTP, for example with Content-Type: application/json.
If the server receives non-JSON payloads, it returns parsing errors, such as invalid character or null byte errors.

Message format

Each outbound message is a JSON object with a type field. For messages that carry a payload, use the data field. The data value must be:
  • Standard Base64 (RFC 4648)
  • An encoding of the raw bytes you intend to send
  • Sent as a JSON string, regardless of the programming language you use
Do not use:
  • Hex encoding
  • URL-safe Base64
  • Raw binary inside JSON strings

Start the stream segment

Send one START_TIME message before the audio chunks for a stream segment:
{ "type": "START_TIME", "data": "<base64>" }
  • data: Base64 of UTF-8 bytes of an RFC 3339 timestamp
  • Example timestamp: 2026-04-25T12:34:56Z

Send audio chunks

Send audio with type set to AUDIO:
{ "type": "AUDIO", "data": "<base64>" }
  • data: Base64 of raw PCM audio bytes

Send control events

Send control events with type set to EVENT:
{ "type": "EVENT", "event": "PAUSE" }
Use the event field, not data. Supported values are PAUSE, RESUME, KEEP_ALIVE, CANCEL, and ABORT. Common use cases:
  • Pause or resume audio.
  • Keep the connection alive.
  • Cancel or abort a stream.
EVENT messages can appear at any point in the stream where control is needed.

End the stream segment

Send the final AUDIO message with data set to RU9G:
{ "type": "AUDIO", "data": "RU9G" }
RU9G is Base64 for ASCII EOF (0x45 0x4F 0x46).
Do not:
  • Send EOF as plain text.
  • Use custom types like end_of_stream.
  • Use binary EOF signaling.

Required message order

For each stream segment:
  1. Send one START_TIME message.
  2. Send one or more AUDIO messages with Base64 PCM audio.
  3. Send any EVENT messages when control is needed.
  4. Send a final AUDIO message with data set to RU9G.

Complete the session after streaming

After sending the final RU9G message, close the WebSocket connection and complete the ambient session with REST.
1

Close the WebSocket Connection

Close the WebSocket connection from the client when you are done sending audio for that segment.
2

End the Session Using REST

End the session using End ambient session.
3

Retrieve Results Using REST

Poll session status and fetch transcript and structured data using these REST APIs:
Final transcripts and notes are not guaranteed to arrive over WebSocket. Treat REST APIs as the source of truth.

Audio format and chunking

Use raw PCM audio chunks in each data message after Base64 decode.

PCM vs WAV

Raw PCM is audio data without a file container. .wav is a container format and includes headers. If your source is WAV, remove the 44-byte RIFF header, or decode the file to raw PCM before sending. Sending WAV headers as PCM reduces recognition quality and makes debugging harder.
  • Encoding: LINEAR16, PCM signed 16-bit little-endian
  • Channels: Mono
  • Sample rate: 16 kHz

Chunk size

Send audio in small chunks during streaming. For 16 kHz, mono, 16-bit audio, about 3200 bytes per chunk is a common choice. For optimal performance, chunk the audio into 100 ms packets.

Encode each chunk

For every AUDIO message:
  1. Take raw PCM bytes.
  2. Encode the bytes using standard Base64 (RFC 4648).
  3. Send the encoded string as data.

Example flow

This example shows the outbound message sequence for one ambient stream segment: one START_TIME message, multiple AUDIO chunks, an optional EVENT, and the final RU9G end marker.
{ "type": "START_TIME", "data": "<base64(timestamp)>" }

{ "type": "AUDIO", "data": "<base64(pcm chunk 1)>" }
{ "type": "AUDIO", "data": "<base64(pcm chunk 2)>" }

{ "type": "EVENT", "event": "PAUSE" }

{ "type": "AUDIO", "data": "<base64(pcm chunk 3)>" }

{ "type": "AUDIO", "data": "RU9G" }

Troubleshooting

Symptom: Poor transcription or failuresFix: Strip the WAV header or decode to raw PCM.
Symptom: Server parse errorsFix: Use standard Base64 (RFC 4648), not URL-safe Base64 or hex.
Symptom: Server parse errorsFix: Send one JSON object per WebSocket frame.
Symptom: Stream does not finalizeFix: Send RU9G as the final AUDIO message.
Symptom: Ignored or invalid control messagesFix: Use the event field.
Symptom: Missing final transcriptFix: Always fetch results through REST APIs.
Last modified on May 22, 2026