Streaming

Set "stream": true to receive Server-Sent Events (SSE). Each event contains a chat.completion.chunk object with a delta of the response.

curl https://api.embercloud.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMBER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Stream Event Format

Each SSE event is prefixed with data: followed by a JSON chunk. The stream ends with data: [DONE].

Stream chunks
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Chunk Fields

ParameterTypeRequiredDescription
deltaobjectRequiredPartial message content. Contains role on the first chunk, content on subsequent chunks.
delta.contentstring | nullOptionalThe next piece of generated text.
delta.rolestring | nullOptionalPresent only in the first chunk. Always "assistant".
delta.reasoningstring | nullOptionalReasoning content from reasoning-capable models (for example GLM and MiniMax), if applicable.
delta.tool_callsarray | nullOptionalTool call deltas, if the model is invoking a function.
finish_reasonstring | nullRequiredNull until the final content chunk, then "stop", "length", or "tool_calls".
usageobject | nullOptionalIncluded in the final chunk when stream_options.include_usage is true.

Reasoning in streaming

Some reasoning-capable models (including GLM and MiniMax) emit delta.reasoning before final content. If your UI only renders delta.content, the stream can look quiet until the model reaches final answer mode.

Handle reasoning + content chunks
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  if (!delta) continue;

  if (delta.reasoning) {
    // show in a hidden debug panel if needed
    process.stderr.write(delta.reasoning);
  }

  if (delta.content) {
    process.stdout.write(delta.content);
  }
}