Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Models

MiniMax-M2.5

Minimax

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.28

Output tokens, 1M

$1.10

Cache read

$0.03

Tokens per sec

Quantization

fp8

Size

230B

196K context

text

code

tool-calling

reasoning

Go to playground

Kimi-K2.5

Moonshotai

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.50

Output tokens, 1M

$2.40

Cache read

$0.12

Tokens per sec

Quantization

Int4

Size

262K context

multimodal

tool-calling

reasoning

Go to playground

Llama-3.3-70B-Instruct

Enterprise-Ready Inference

Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.

Talk to our team

gpt-oss-120b

OpenAI

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.05

Output tokens, 1M

$0.45

Cache read

$0.025

Tokens per sec

Quantization

fp8

Size

120B

131K context

text

code

tool-calling

reasoning

Go to playground

DeepSeek-V3.2

DeepSeek

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.269

Output tokens, 1M

$0.40

Cache read

$0.13

Tokens per sec

Quantization

fp8

Size

685B

163K context

text

code

tool-calling

reasoning

Go to playground

GLM-5

Zai

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.80

Output tokens, 1M

$2.56

Cache read

$0.20

Tokens per sec

Quantization

fp8

Size

744B

200K context

text

code

tool-calling

reasoning

Go to playground

DeepSeek-R1-0528

DeepSeek

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.50

Output tokens, 1M

$2.00

Cache read

Tokens per sec

fp8

Quantization

fp8

Size

685B

164K context

JSON mode

MoE

code

reasoning

Go to playground

Run any model on the fastest endpoints

Use our API to deploy any model on one of the most cost-efficient inference stacks available.

Scale seamlessly to a dedicated deployment at any time for optimal throughput.

Start building

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'