Streaming
Set "stream": true to receive Server-Sent Events (SSE). Each event contains a chat.completion.chunk object with a delta of the response.
curl https://api.embercloud.ai/v1/chat/completions \
-H "Authorization: Bearer $EMBER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4.7",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Stream Event Format
Each SSE event is prefixed with data: followed by a JSON chunk. The stream ends with data: [DONE].
Stream chunks
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Chunk Fields
| Parameter | Type | Required | Description |
|---|---|---|---|
delta | object | Required | Partial message content. Contains role on the first chunk, content on subsequent chunks. |
→delta.content | string | null | Optional | The next piece of generated text. |
→delta.role | string | null | Optional | Present only in the first chunk. Always "assistant". |
→delta.reasoning | string | null | Optional | Reasoning content from reasoning-capable models (for example GLM and MiniMax), if applicable. |
→delta.tool_calls | array | null | Optional | Tool call deltas, if the model is invoking a function. |
finish_reason | string | null | Required | Null until the final content chunk, then "stop", "length", or "tool_calls". |
usage | object | null | Optional | Included in the final chunk when stream_options.include_usage is true. |
Reasoning in streaming
Some reasoning-capable models (including GLM and MiniMax) emit delta.reasoning before final content. If your UI only renders delta.content, the stream can look quiet until the model reaches final answer mode.
Handle reasoning + content chunks
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (!delta) continue;
if (delta.reasoning) {
// show in a hidden debug panel if needed
process.stderr.write(delta.reasoning);
}
if (delta.content) {
process.stdout.write(delta.content);
}
}