GLM 5 is Live on EmberCloud. Try It Today

New: OpenAI-compatible chat + embeddings

Affordable tokens at blazing fast speeds

Serverless GPU inference for open source models with predictable latency, simple pricing, and drop-in OpenAI APIs.

Start building View docs

Zero cold starts Usage + rate limits OpenAI compatible

Global Infrastructure

Worldwide reach,
blazing fast inference

Distributed cloud infrastructure with an inference engine built for speed.

Integrate in seconds

Our API is fully compatible with the OpenAI SDK. Simply change the base URL and API key to switch to open-source models.

Get your API Key

Configure Client

Point your existing SDK to EmberCloud endpoints.

main.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.embercloud.ai/v1",
    api_key="ember_sk_..."
)

completion = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Hello World!"}
    ]
)

print(completion.choices[0].message.content)

Model Library

Production-ready models

Access top-tier open models through a unified, OpenAI-compatible API.

New

Chat

GLM 5

745B MoE 203K ctx

$0.720 in · $2.30 out / 1M

View pricing

Chat

GLM 4.7

355B MoE 200K ctx

$0.380 in · $1.98 out / 1M

View pricing

Fast

GLM 4.7 Flash

30B MoE 200K ctx

$0.060 in · $0.400 out / 1M

View pricing

Chat

GLM 4.5

355B MoE 131K ctx

$0.600 in · $2.20 out / 1M

View pricing

New

Chat

GLM 5

745B MoE 203K ctx

$0.720 in · $2.30 out / 1M

View pricing

Chat

GLM 4.7

355B MoE 200K ctx

$0.380 in · $1.98 out / 1M

View pricing

Fast

GLM 4.7 Flash

30B MoE 200K ctx

$0.060 in · $0.400 out / 1M

View pricing

Chat

GLM 4.5

355B MoE 131K ctx

$0.600 in · $2.20 out / 1M

View pricing

Value

Chat

GLM 4.5 Air

131K ctx

$0.130 in · $0.850 out / 1M

View pricing

Value

Code

Qwen3 Coder Next

262K ctx

$0.108 in · $0.675 out / 1M

View pricing

Chat

Kimi K2.5

262K ctx

$0.405 in · $1.98 out / 1M

View pricing

Chat

MiniMax M2.5

196K ctx

$0.200 in · $1.20 out / 1M

View pricing

Value

Chat

GLM 4.5 Air

131K ctx

$0.130 in · $0.850 out / 1M

View pricing

Value

Code

Qwen3 Coder Next

262K ctx

$0.108 in · $0.675 out / 1M

View pricing

Chat

Kimi K2.5

262K ctx

$0.405 in · $1.98 out / 1M

View pricing

Chat

MiniMax M2.5

196K ctx

$0.200 in · $1.20 out / 1M

View pricing

Explore all models

Transparent Pricing

Flexible token pricing

Pay only for what you generate. No idle costs.

Model	Context	Input	Output	Cached Input
GLM 5New	203K	$0.720 / 1M	$2.30 / 1M	$0.144 / 1M
GLM 4.7	200K	$0.380 / 1M	$1.98 / 1M	$0.190 / 1M
GLM 4.7 FlashFast	200K	$0.060 / 1M	$0.400 / 1M	$0.010 / 1M
GLM 4.5	131K	$0.600 / 1M	$2.20 / 1M	$0.110 / 1M
GLM 4.5 AirValue	131K	$0.130 / 1M	$0.850 / 1M	$0.025 / 1M
Qwen3 Coder NextValue	262K	$0.108 / 1M	$0.675 / 1M	$0.060 / 1M
Kimi K2.5	262K	$0.405 / 1M	$1.98 / 1M	$0.225 / 1M
MiniMax M2.5	196K	$0.200 / 1M	$1.20 / 1M	$0.040 / 1M

Affordable tokens at blazing fast speeds

Worldwide reach,blazing fast inference

Integrate in seconds

Get your API Key

Configure Client

Production-ready models

GLM 5

GLM 4.7

GLM 4.7 Flash

GLM 4.5

GLM 5

GLM 4.7

GLM 4.7 Flash

GLM 4.5

GLM 4.5 Air

Qwen3 Coder Next

Kimi K2.5

MiniMax M2.5

GLM 4.5 Air

Qwen3 Coder Next

Kimi K2.5

MiniMax M2.5

Flexible token pricing

Worldwide reach,
blazing fast inference