Reasoning

Reasoning models can emit an internal reasoning text stream before they start returning user-facing content. This can be useful for observability and quality but can make streaming appear delayed if your client only reads content.

Reasoning-capable models

The models that may emit reasoning output are:

  • glm-5
  • glm-4.7
  • glm-4.6
  • minimax-m2.5

MiniMax-M2.5 emits reasoning content by default; it does not currently support disabling reasoning in this release.

Request parameters

Use one of these request switches to control behavior. If you do not set either and use a reasoning-capable model, reasoning may be emitted.

ParameterTypeRequiredDescription
reasoningobjectOptionalControls reasoning output for models that support it. MiniMax-M2.5 reasoning is always on in this release.
reasoning.enabledbooleanOptionalSet to false to disable reasoning and emit normal content immediately.
reasoning.effortstringOptionalOptional effort setting for advanced control when reasoning is allowed.
thinkingobjectOptionalLegacy compatibility switch. Use { "type": "disabled" } to suppress reasoning in many flows.

OpenRouter routing behavior

In OpenRouter-style routing, reasoning.enabled is the recommended control. Ember Cloud also accepts thinking.type: disabled as compatibility support, but behavior is primarily driven by the newer reasoning shape.

If you only set thinking: { "type": "disabled" } and your chosen model still returns silent reasoning phases, switch to explicit reasoning: { "enabled": false } for reliable non-reasoning output when using GLM models. MiniMax-M2.5 keeps reasoning enabled.

See OpenRouter reasoning token guidance: reasoning-tokens guide.

Streaming behavior

With reasoning enabled, a long request can produce many delta.reasoning chunks first and keep delta.contentempty until the model is ready to emit final text.

Ember Cloud keeps the connection open during long reasoning windows with SSE keep-alive comments, so the socket should stay alive even when real tokens are delayed.

Disable reasoning

curl https://api.embercloud.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMBER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5",
    "stream": true,
    "messages": [{ "role": "user", "content": "Write a long report in detail." }],
    "reasoning": {
      "enabled": false
    }
  }'

If you are debugging model steps, you can keep reasoning enabled and render reasoning in a separate panel instead of dropping it from the stream.