Tool Calling

EmberCloud models can call tools you define — enabling your model to take actions, query APIs, or fetch real-time data. The API is fully compatible with the OpenAI tool calling format.

OpenAI SDK compatible: If you already use the OpenAI SDK for tool calling, just change baseURL and apiKey. No other changes required.

The flow has two steps:

  1. Send a request with a list of tools. The model decides when to call one.
  2. Execute the tool in your code, send the result back, and get the final reply.

1. Define Tools & Get a Tool Call

Pass a tools array to the request. Each tool has a JSON Schema parameters object so the model knows what arguments to generate.

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://api.embercloud.ai/v1",
    api_key="your-api-key",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'San Francisco'",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                    },
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

# If the model wants to call a tool, finish_reason is "tool_calls"
choice = response.choices[0]
if choice.finish_reason == "tool_calls":
    tool_call = choice.message.tool_calls[0]
    print(f"Tool: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")
    # → Tool: get_weather
    # → Args: {"city": "Tokyo", "unit": "celsius"}

Example response when the model wants to call a tool:

Response (finish_reason: tool_calls)
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"Tokyo\", \"unit\": \"celsius\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 18,
    "total_tokens": 100
  }
}

2. Execute the Tool & Return the Result

When finish_reason is "tool_calls", run the function in your code and send the result back as a tool message. The model then generates the final user-facing reply.

import json

# --- After receiving a tool_calls response ---

# 1. Extract the tool call
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)

# 2. Execute your function
def get_weather(city, unit="celsius"):
    # Your real implementation here
    return {"temperature": 18, "unit": unit, "condition": "Partly cloudy"}

result = get_weather(**args)

# 3. Build the follow-up messages
messages = [
    {"role": "user", "content": "What's the weather in Tokyo?"},
    response.choices[0].message,  # assistant's tool_calls message
    {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result),
    },
]

# 4. Send back to get the final answer
final = client.chat.completions.create(
    model="glm-4.7",
    messages=messages,
    tools=tools,
)

print(final.choices[0].message.content)
# → "The weather in Tokyo is 18°C and partly cloudy."

Parallel Tool Calls

The model may return multiple tool calls in a single response when it needs several pieces of information. Run them in parallel and return all results together.

import asyncio, json

# The model might call get_weather for multiple cities at once
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Compare weather in Tokyo and London."}],
    tools=tools,
)

tool_calls = response.choices[0].message.tool_calls
# tool_calls may have 2 entries: one for Tokyo, one for London

# Run all calls and collect results
tool_results = []
for tc in tool_calls:
    args = json.loads(tc.function.arguments)
    result = get_weather(**args)  # your function
    tool_results.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": json.dumps(result),
    })

messages = [
    {"role": "user", "content": "Compare weather in Tokyo and London."},
    response.choices[0].message,
    *tool_results,
]

final = client.chat.completions.create(
    model="glm-4.7",
    messages=messages,
    tools=tools,
)
print(final.choices[0].message.content)

Controlling Tool Use

Use tool_choice to control whether the model can, must, or cannot call tools.

ParameterTypeRequiredDescription
tool_choice: "auto"defaultOptionalThe model decides whether to call a tool or respond directly. This is the default.
tool_choice: "required"stringOptionalThe model must call at least one tool. Useful when you always want structured output.
tool_choice: "none"stringOptionalThe model will not call any tools. Use to force a plain text response even when tools are defined.
tool_choice: { type: "function", function: { name: "..." } }objectOptionalForce the model to call a specific named function.

JSON / Structured Output

For simple structured data, use response_format: { type: "json_object" } instead of tool calling. The model guarantees valid JSON output.

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {
            "role": "system",
            "content": "You extract information and return it as JSON.",
        },
        {
            "role": "user",
            "content": "Extract: name, age, city from 'Alice is 30 and lives in Paris'",
        },
    ],
    response_format={"type": "json_object"},
)

import json
data = json.loads(response.choices[0].message.content)
print(data)
# → {"name": "Alice", "age": 30, "city": "Paris"}