Serverless GPU inference for open source models with predictable latency, simple pricing, and drop-in OpenAI APIs.
Distributed cloud infrastructure with an inference engine built for speed.
Our API is fully compatible with the OpenAI SDK. Simply change the base URL and API key to switch to open-source models.
Sign up and generate a key in the dashboard.
Point your existing SDK to EmberCloud endpoints.
from openai import OpenAI client = OpenAI( base_url="https://api.embercloud.ai/v1", api_key="ember_sk_..." ) completion = client.chat.completions.create( model="glm-4.7", messages=[ {"role": "user", "content": "Hello World!"} ] ) print(completion.choices[0].message.content)
Access top-tier open models through a unified, OpenAI-compatible API.
Pay only for what you generate. No idle costs.
| Model | Context | Input | Output | Cached Input |
|---|---|---|---|---|
| 203K | $0.720 / 1M | $2.30 / 1M | $0.144 / 1M | |
| 200K | $0.380 / 1M | $1.98 / 1M | $0.190 / 1M | |
| 200K | $0.060 / 1M | $0.400 / 1M | $0.010 / 1M | |
| 131K | $0.600 / 1M | $2.20 / 1M | $0.110 / 1M | |
| 131K | $0.130 / 1M | $0.850 / 1M | $0.025 / 1M | |
| 262K | $0.108 / 1M | $0.675 / 1M | $0.060 / 1M | |
| 262K | $0.405 / 1M | $1.98 / 1M | $0.225 / 1M | |
| 196K | $0.200 / 1M | $1.20 / 1M | $0.040 / 1M |