Pricing
Simple per-token pricing. No minimums, no commitments. All prices in USD per 1M tokens.
| Model | Context | Max Output | Input Price | Output Price | Cache Read |
|---|---|---|---|---|---|
minimax-m2.5 196.6K context, efficient reasoning | 196.6K | 196.6K | $0.200 / 1M tokens | $1.20 / 1M tokens | $0.040 / 1M tokens |
glm-5 Most capable reasoning model | 203K | 131K | $0.720 / 1M tokens | $2.30 / 1M tokens | $0.144 / 1M tokens |
glm-4.7 Flagship reasoning model | 200K | 131K | $0.380 / 1M tokens | $1.98 / 1M tokens | $0.190 / 1M tokens |
glm-4.7-flash Fast, cost-efficient variant | 200K | 131K | $0.060 / 1M tokens | $0.400 / 1M tokens | $0.010 / 1M tokens |
glm-4.5 General-purpose model | 131K | 96K | $0.600 / 1M tokens | $2.20 / 1M tokens | $0.110 / 1M tokens |
glm-4.5-air Lightweight, budget-friendly | 131K | 96K | $0.130 / 1M tokens | $0.850 / 1M tokens | $0.025 / 1M tokens |
kimi-k2.5 262K context, MoE architecture | 262K | 262K | $0.405 / 1M tokens | $1.98 / 1M tokens | $0.225 / 1M tokens |
qwen3-coder-next 262K context, fast code generation | 262K | 262K | $0.108 / 1M tokens | $0.675 / 1M tokens | $0.060 / 1M tokens |
Pricing is subject to change. All prices are in USD per 1M tokens. All models support text input/output, tool calling, JSON mode, and streaming. Reasoning is supported on GLM-5, GLM-4.7, and MiniMax-M2.5.