Models

MiniMax-M2.5
Minimax
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.28
Output tokens, 1M
$1.10
Cache read
$0.03
Tokens per sec
40
Quantization
fp8
Size
230B
196K context
text
code
tool-calling
reasoning

Kimi-K2.5
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.50
Output tokens, 1M
$2.40
Cache read
$0.12
Tokens per sec
47
Quantization
Int4
Size
262K context
multimodal
tool-calling
reasoning

Llama-3.3-70B-Instruct
Meta
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.12
Output tokens, 1M
$0.38
Cache read
N/A
Tokens per sec
40
Quantization
fp8
Size
70B
131K context
text
chat
tool-calling
Enterprise-Ready Inference
Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.
gpt-oss-120b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.05
Output tokens, 1M
$0.45
Cache read
$0.025
Tokens per sec
65
Quantization
fp8
Size
120B
131K context
text
code
tool-calling
reasoning

DeepSeek-V3.2
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.269
Output tokens, 1M
$0.40
Cache read
$0.13
Tokens per sec
30
Quantization
fp8
Size
685B
163K context
text
code
tool-calling
reasoning

GLM-5
Zai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.80
Output tokens, 1M
$2.56
Cache read
$0.20
Tokens per sec
50
Quantization
fp8
Size
744B
200K context
text
code
tool-calling
reasoning

DeepSeek-R1-0528
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.50
Output tokens, 1M
$2.00
Cache read
20
Tokens per sec
fp8
Quantization
fp8
Size
685B
164K context
JSON mode
MoE
code
reasoning
Run any model on the fastest endpoints
Use our API to deploy any model on one of the most cost-efficient inference stacks available.
Scale seamlessly to a dedicated deployment at any time for optimal throughput.