GLM 5 is Live on EmberCloud. Try It Today
New: OpenAI-compatible chat + embeddings

Affordable tokens at blazing fast speeds

Serverless GPU inference for open source models with predictable latency, simple pricing, and drop-in OpenAI APIs.

Zero cold starts Usage + rate limits OpenAI compatible
Global Infrastructure

Worldwide reach,
blazing fast inference

Distributed cloud infrastructure with an inference engine built for speed.

Integrate in seconds

Our API is fully compatible with the OpenAI SDK. Simply change the base URL and API key to switch to open-source models.

1

Get your API Key

Sign up and generate a key in the dashboard.

2

Configure Client

Point your existing SDK to EmberCloud endpoints.

main.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.embercloud.ai/v1",
    api_key="ember_sk_..."
)

completion = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Hello World!"}
    ]
)

print(completion.choices[0].message.content)
Model Library

Production-ready models

Access top-tier open models through a unified, OpenAI-compatible API.

New
Chat
GLM

GLM 5

745B MoE 203K ctx
$0.720 in · $2.30 out / 1M
View pricing
Chat
GLM

GLM 4.7

355B MoE 200K ctx
$0.380 in · $1.98 out / 1M
View pricing
Fast
Fast
GLM

GLM 4.7 Flash

30B MoE 200K ctx
$0.060 in · $0.400 out / 1M
View pricing
Chat
GLM

GLM 4.5

355B MoE 131K ctx
$0.600 in · $2.20 out / 1M
View pricing
New
Chat
GLM

GLM 5

745B MoE 203K ctx
$0.720 in · $2.30 out / 1M
View pricing
Chat
GLM

GLM 4.7

355B MoE 200K ctx
$0.380 in · $1.98 out / 1M
View pricing
Fast
Fast
GLM

GLM 4.7 Flash

30B MoE 200K ctx
$0.060 in · $0.400 out / 1M
View pricing
Chat
GLM

GLM 4.5

355B MoE 131K ctx
$0.600 in · $2.20 out / 1M
View pricing
Value
Chat
GLM

GLM 4.5 Air

131K ctx
$0.130 in · $0.850 out / 1M
View pricing
Value
Code
Qwen

Qwen3 Coder Next

262K ctx
$0.108 in · $0.675 out / 1M
View pricing
Chat
Kimi

Kimi K2.5

262K ctx
$0.405 in · $1.98 out / 1M
View pricing
Chat
MiniMax

MiniMax M2.5

196K ctx
$0.200 in · $1.20 out / 1M
View pricing
Value
Chat
GLM

GLM 4.5 Air

131K ctx
$0.130 in · $0.850 out / 1M
View pricing
Value
Code
Qwen

Qwen3 Coder Next

262K ctx
$0.108 in · $0.675 out / 1M
View pricing
Chat
Kimi

Kimi K2.5

262K ctx
$0.405 in · $1.98 out / 1M
View pricing
Chat
MiniMax

MiniMax M2.5

196K ctx
$0.200 in · $1.20 out / 1M
View pricing
Transparent Pricing

Flexible token pricing

Pay only for what you generate. No idle costs.

ModelContextInputOutputCached Input
GLMGLM 5New
203K$0.720 / 1M$2.30 / 1M$0.144 / 1M
GLMGLM 4.7
200K$0.380 / 1M$1.98 / 1M$0.190 / 1M
GLMGLM 4.7 FlashFast
200K$0.060 / 1M$0.400 / 1M$0.010 / 1M
GLMGLM 4.5
131K$0.600 / 1M$2.20 / 1M$0.110 / 1M
GLMGLM 4.5 AirValue
131K$0.130 / 1M$0.850 / 1M$0.025 / 1M
QwenQwen3 Coder NextValue
262K$0.108 / 1M$0.675 / 1M$0.060 / 1M
KimiKimi K2.5
262K$0.405 / 1M$1.98 / 1M$0.225 / 1M
MiniMaxMiniMax M2.5
196K$0.200 / 1M$1.20 / 1M$0.040 / 1M