Reasoning
Reasoning models can emit an internal reasoning text stream before they start returning user-facing content. This can be useful for observability and quality but can make streaming appear delayed if your client only reads content.
Reasoning-capable models
The models that may emit reasoning output are:
glm-5glm-4.7glm-4.6minimax-m2.5
MiniMax-M2.5 emits reasoning content by default; it does not currently support disabling reasoning in this release.
Request parameters
Use one of these request switches to control behavior. If you do not set either and use a reasoning-capable model, reasoning may be emitted.
| Parameter | Type | Required | Description |
|---|---|---|---|
reasoning | object | Optional | Controls reasoning output for models that support it. MiniMax-M2.5 reasoning is always on in this release. |
→reasoning.enabled | boolean | Optional | Set to false to disable reasoning and emit normal content immediately. |
→reasoning.effort | string | Optional | Optional effort setting for advanced control when reasoning is allowed. |
thinking | object | Optional | Legacy compatibility switch. Use { "type": "disabled" } to suppress reasoning in many flows. |
OpenRouter routing behavior
In OpenRouter-style routing, reasoning.enabled is the recommended control. Ember Cloud also accepts thinking.type: disabled as compatibility support, but behavior is primarily driven by the newer reasoning shape.
If you only set thinking: { "type": "disabled" } and your chosen model still returns silent reasoning phases, switch to explicit reasoning: { "enabled": false } for reliable non-reasoning output when using GLM models. MiniMax-M2.5 keeps reasoning enabled.
See OpenRouter reasoning token guidance: reasoning-tokens guide.
Streaming behavior
With reasoning enabled, a long request can produce many delta.reasoning chunks first and keep delta.contentempty until the model is ready to emit final text.
Ember Cloud keeps the connection open during long reasoning windows with SSE keep-alive comments, so the socket should stay alive even when real tokens are delayed.
Disable reasoning
curl https://api.embercloud.ai/v1/chat/completions \
-H "Authorization: Bearer $EMBER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"stream": true,
"messages": [{ "role": "user", "content": "Write a long report in detail." }],
"reasoning": {
"enabled": false
}
}'If you are debugging model steps, you can keep reasoning enabled and render reasoning in a separate panel instead of dropping it from the stream.