Streaming

Set "stream": true to receive Server-Sent Events (SSE). Each event contains a chat.completion.chunk object with a delta of the response.

curl https://api.embercloud.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMBER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Stream Event Format

Each SSE event is prefixed with data: followed by a JSON chunk. The stream ends with data: [DONE].

Stream chunks

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1707436800,"model":"glm-4.7","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Chunk Fields

Parameter	Type	Required	Description
`delta`	object	Required	Partial message content. Contains role on the first chunk, content on subsequent chunks.
`→delta.content`	string \| null	Optional	The next piece of generated text.
`→delta.role`	string \| null	Optional	Present only in the first chunk. Always "assistant".
`→delta.reasoning`	string \| null	Optional	Reasoning content from reasoning-capable models (for example GLM and MiniMax), if applicable.
`→delta.tool_calls`	array \| null	Optional	Tool call deltas, if the model is invoking a function.
`finish_reason`	string \| null	Required	Null until the final content chunk, then "stop", "length", or "tool_calls".
`usage`	object \| null	Optional	Included in the final chunk when stream_options.include_usage is true.

Reasoning in streaming

Some reasoning-capable models (including GLM and MiniMax) emit delta.reasoning before final content. If your UI only renders delta.content, the stream can look quiet until the model reaches final answer mode.

Handle reasoning + content chunks

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  if (!delta) continue;

  if (delta.reasoning) {
    // show in a hidden debug panel if needed
    process.stderr.write(delta.reasoning);
  }

  if (delta.content) {
    process.stdout.write(delta.content);
  }
}