Chart Configuration
Filter Models
Filter models by their quality score
0 (Any quality)67 (Highest quality)
Filter models by their price per million tokens
$0 (Free)$30 (Premium)
Model Comparison Chart(301 models)
Model Data Table
Name | Provider | Creator | Model | Price (per M/Token) | Input Price | Output Price | Output Speed | Context Window | Latency | Quality Index |
---|---|---|---|---|---|---|---|---|---|---|
o3-mini (high) | OpenAI | OpenAI | o3-mini (high) | $1.93 | $1.10 | $4.40 | 118.5 tokens/s | 200,000 tokens | 59.59 s | 66 |
o3-mini | OpenAI | OpenAI | o3-mini | $1.93 | $1.10 | $4.40 | 183.2 tokens/s | 200,000 tokens | 12.49 s | 63 |
o3-mini | Microsoft Azure | OpenAI | o3-mini | $1.93 | $1.10 | $4.40 | 142.6 tokens/s | 200,000 tokens | 21.29 s | 63 |
o1 | Microsoft Azure | OpenAI | o1 | $26.25 | $15.00 | $60.00 | 104.5 tokens/s | 200,000 tokens | 31.34 s | 62 |
DeepSeek R1 | DeepSeek | DeepSeek | DeepSeek R1 | $0.96 | $0.55 | $2.19 | 25.1 tokens/s | 64,000 tokens | 68.28 s | 60 |
DeepSeek R1 | Hyperbolic | DeepSeek | DeepSeek R1 | $2.00 | $2.00 | $2.00 | 70 tokens/s | 128,000 tokens | 37.51 s | 60 |
DeepSeek R1 Base | Nebius AI Studio | DeepSeek | DeepSeek R1 | $1.20 | $0.80 | $2.40 | 12.5 tokens/s | 128,000 tokens | 123.9 s | 60 |
DeepSeek R1 Fast | Nebius AI Studio | DeepSeek | DeepSeek R1 | $3.00 | $2.00 | $6.00 | 72.5 tokens/s | 128,000 tokens | 19.82 s | 60 |
DeepSeek R1 | CentML | DeepSeek | DeepSeek R1 | $3.99 | $3.99 | $3.99 | 71.2 tokens/s | 128,000 tokens | 19.24 s | 60 |
DeepSeek R1 | Microsoft Azure | DeepSeek | DeepSeek R1 | $0.00 | $0.00 | $0.00 | 26.4 tokens/s | 128,000 tokens | 54.07 s | 60 |
DeepSeek R1 | Fireworks | DeepSeek | DeepSeek R1 | $4.25 | $3.00 | $8.00 | 87.5 tokens/s | 128,000 tokens | 17.68 s | 60 |
DeepSeek R1 | Deepinfra | DeepSeek | DeepSeek R1 | $1.16 | $0.75 | $2.40 | 8.5 tokens/s | 64,000 tokens | 212.83 s | 60 |
DeepSeek R1 | Novita | DeepSeek | DeepSeek R1 | $4.00 | $4.00 | $4.00 | 15.8 tokens/s | 64,000 tokens | 65.29 s | 60 |
DeepSeek R1 | Together.ai | DeepSeek | DeepSeek R1 | $4.00 | $3.00 | $7.00 | 113.6 tokens/s | 128,000 tokens | 15.23 s | 60 |
DeepSeek R1 | kluster.ai | DeepSeek | DeepSeek R1 | $7.00 | $7.00 | $7.00 | 26.6 tokens/s | 128,000 tokens | 62.07 s | 60 |
Claude 3.7 Sonnet Thinking | Anthropic | Anthropic | Claude 3.7 Sonnet Thinking | $6.00 | $3.00 | $15.00 | 79.6 tokens/s | 200,000 tokens | 0.92 s | 57 |
o1-mini | OpenAI | OpenAI | o1-mini | $1.93 | $1.10 | $4.40 | 200.8 tokens/s | 128,000 tokens | 11.33 s | 54 |
o1-mini | Microsoft Azure | OpenAI | o1-mini | $5.78 | $3.30 | $13.20 | 203.7 tokens/s | 128,000 tokens | 14.2 s | 54 |
DeepSeek R1 Distill Qwen 32B | Deepinfra | DeepSeek | DeepSeek R1 Distill Qwen 32B | $0.14 | $0.12 | $0.18 | 40 tokens/s | 128,000 tokens | 16.31 s | 51 |
DeepSeek R1 Distill Qwen 32B | Novita | DeepSeek | DeepSeek R1 Distill Qwen 32B | $0.30 | $0.30 | $0.30 | 21 tokens/s | 64,000 tokens | 32.19 s | 51 |
DeepSeek R1 Distill Qwen 32B | Groq | DeepSeek | DeepSeek R1 Distill Qwen 32B | $0.69 | $0.69 | $0.69 | 96.8 tokens/s | 128,000 tokens | 2.31 s | 51 |
Gemini 2.0 Pro Experimental (AI Studio) | Google (AI Studio) | Gemini 2.0 Pro Experimental (AI Studio) | $0.00 | $0.00 | $0.00 | 123.3 tokens/s | 2,000,000 tokens | 0.56 s | 49 | |
DeepSeek R1 Distill Qwen 14B | Novita | DeepSeek | DeepSeek R1 Distill Qwen 14B | $0.15 | $0.15 | $0.15 | 45.7 tokens/s | 64,000 tokens | 17.88 s | 49 |
DeepSeek R1 Distill Qwen 14B | Together.ai | DeepSeek | DeepSeek R1 Distill Qwen 14B | $1.60 | $1.60 | $1.60 | 164.8 tokens/s | 128,000 tokens | 6.62 s | 49 |
DeepSeek R1 Distill Llama 70B | Cerebras | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.94 | $0.85 | $1.20 | 1864 tokens/s | 66,000 tokens | 0.82 s | 48 |
DeepSeek R1 Distill Llama 70B Base | Nebius AI Studio | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.38 | $0.25 | $0.75 | 37.3 tokens/s | 128,000 tokens | 25 s | 48 |
DeepSeek R1 Distill Llama 70B | Deepinfra | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.34 | $0.23 | $0.69 | 39.8 tokens/s | 128,000 tokens | 10.84 s | 48 |
DeepSeek R1 Distill Llama 70B | Novita | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.39 | $0.39 | $0.39 | 22.1 tokens/s | 32,000 tokens | 15.56 s | 48 |
DeepSeek R1 Distill Llama 70B | Groq | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.81 | $0.75 | $0.99 | 256.8 tokens/s | 128,000 tokens | 3.65 s | 48 |
DeepSeek R1 Distill Llama 70B (Spec decoding) | Groq | DeepSeek | DeepSeek R1 Distill Llama 70B (Spec decoding) | $0.81 | $0.75 | $0.99 | 1754.8 tokens/s | 128,000 tokens | 1.19 s | 48 |
DeepSeek R1 Distill Llama 70B | SambaNova | DeepSeek | DeepSeek R1 Distill Llama 70B | $0.88 | $0.70 | $1.40 | 124.3 tokens/s | 16,000 tokens | 12.2 s | 48 |
DeepSeek R1 Distill Llama 70B | Together.ai | DeepSeek | DeepSeek R1 Distill Llama 70B | $2.00 | $2.00 | $2.00 | 110.4 tokens/s | 128,000 tokens | 15.8 s | 48 |
Claude 3.7 Sonnet | Amazon Bedrock | Anthropic | Claude 3.7 Sonnet | $6.00 | $3.00 | $15.00 | 40.7 tokens/s | 200,000 tokens | 0.74 s | 48 |
Claude 3.7 Sonnet | Anthropic | Anthropic | Claude 3.7 Sonnet | $6.00 | $3.00 | $15.00 | 80.3 tokens/s | 200,000 tokens | 0.97 s | 48 |
Gemini 2.0 Flash Vertex | Google Vertex | Gemini 2.0 Flash Vertex | $0.26 | $0.15 | $0.60 | 0 tokens/s | 1,000,000 tokens | 0.1 s | 48 | |
Gemini 2.0 Flash (AI Studio) | Google (AI Studio) | Gemini 2.0 Flash (AI Studio) | $0.17 | $0.10 | $0.40 | 186.8 tokens/s | 1,000,000 tokens | 0.36 s | 48 | |
DeepSeek V3 | DeepSeek | DeepSeek | DeepSeek V3 | $0.48 | $0.27 | $1.10 | 27.4 tokens/s | 66,000 tokens | 7.19 s | 46 |
DeepSeek V3 (FP8) | Hyperbolic (FP8) | DeepSeek | DeepSeek V3 | $0.25 | $0.25 | $0.25 | 19.2 tokens/s | 128,000 tokens | 0.91 s | 46 |
DeepSeek V3 | Nebius AI Studio | DeepSeek | DeepSeek V3 | $0.75 | $0.50 | $1.50 | 26.1 tokens/s | 128,000 tokens | 0.86 s | 46 |
DeepSeek V3 | Fireworks | DeepSeek | DeepSeek V3 | $1.31 | $0.75 | $3.00 | 59.1 tokens/s | 128,000 tokens | 0.78 s | 46 |
DeepSeek V3 | Deepinfra | DeepSeek | DeepSeek V3 | $0.59 | $0.49 | $0.89 | 7.5 tokens/s | 64,000 tokens | 1.01 s | 46 |
DeepSeek V3 | Novita | DeepSeek | DeepSeek V3 | $0.89 | $0.89 | $0.89 | 14.7 tokens/s | 64,000 tokens | 1.9 s | 46 |
DeepSeek V3 (FP8) | Together.ai | DeepSeek | DeepSeek V3 | $1.25 | $1.25 | $1.25 | 18.6 tokens/s | 128,000 tokens | 0.39 s | 46 |
Qwen2.5 Max | Alibaba Cloud | Alibaba | Qwen2.5 Max | $2.80 | $1.60 | $6.40 | 36.1 tokens/s | 32,000 tokens | 1.26 s | 45 |
Gemini 1.5 Pro (Sep) (Vertex) | Google (Vertex) | Gemini 1.5 Pro (Sep) (Vertex) | $2.19 | $1.25 | $5.00 | 0 tokens/s | 2,000,000 tokens | 0.1 s | 45 | |
Gemini 1.5 Pro (Sep) (AI Studio) | Google (AI Studio) | Gemini 1.5 Pro (Sep) (AI Studio) | $2.19 | $1.25 | $5.00 | 100.3 tokens/s | 2,000,000 tokens | 0.57 s | 45 | |
Claude 3.5 Sonnet (Oct) | Amazon Bedrock | Anthropic | Claude 3.5 Sonnet (Oct) | $6.00 | $3.00 | $15.00 | 43 tokens/s | 200,000 tokens | 1.3 s | 44 |
Claude 3.5 Sonnet (Oct) Vertex | Google Vertex | Anthropic | Claude 3.5 Sonnet (Oct) Vertex | $6.00 | $3.00 | $15.00 | 80.1 tokens/s | 200,000 tokens | 0.82 s | 44 |
Claude 3.5 Sonnet (Oct) | Anthropic | Anthropic | Claude 3.5 Sonnet (Oct) | $6.00 | $3.00 | $15.00 | 77.2 tokens/s | 200,000 tokens | 1.27 s | 44 |
QwQ 32B-Preview | Hyperbolic | Alibaba | QwQ 32B-Preview | $0.20 | $0.20 | $0.20 | 67.9 tokens/s | 33,000 tokens | 1.15 s | 43 |
QwQ 32B-Preview Base | Nebius AI Studio | Alibaba | QwQ 32B-Preview | $0.14 | $0.09 | $0.27 | 83.5 tokens/s | 33,000 tokens | 0.6 s | 43 |
QwQ 32B-Preview | Fireworks | Alibaba | QwQ 32B-Preview | $0.90 | $0.90 | $0.90 | 68.7 tokens/s | 33,000 tokens | 0.41 s | 43 |
QwQ 32B-Preview | Deepinfra | Alibaba | QwQ 32B-Preview | $0.26 | $0.15 | $0.60 | 45.1 tokens/s | 33,000 tokens | 14.98 s | 43 |
QwQ 32B-Preview | Together.ai | Alibaba | QwQ 32B-Preview | $1.20 | $1.20 | $1.20 | 65.2 tokens/s | 33,000 tokens | 0.59 s | 43 |
Gemini 2.0 Flash-Lite (Preview) (AI Studio) | Google (AI Studio) | Gemini 2.0 Flash-Lite (Preview) (AI Studio) | $0.13 | $0.07 | $0.30 | 184 tokens/s | 1,000,000 tokens | 0.24 s | 42 | |
GPT-4o (Nov '24) | OpenAI | OpenAI | GPT-4o (Nov '24) | $4.38 | $2.50 | $10.00 | 93.6 tokens/s | 128,000 tokens | 0.38 s | 41 |
GPT-4o (Nov '24) | Microsoft Azure | OpenAI | GPT-4o (Nov '24) | $4.38 | $2.50 | $10.00 | 135.4 tokens/s | 128,000 tokens | 0.99 s | 41 |
Llama 3.3 70B | Cerebras | Meta | Llama 3.3 70B | $0.94 | $0.85 | $1.20 | 2196.9 tokens/s | 33,000 tokens | 0.18 s | 41 |
Llama 3.3 70B | Hyperbolic | Meta | Llama 3.3 70B | $0.40 | $0.40 | $0.40 | 38.7 tokens/s | 128,000 tokens | 1.32 s | 41 |
Llama 3.3 70B | Amazon Bedrock | Meta | Llama 3.3 70B | $0.71 | $0.71 | $0.71 | 136.2 tokens/s | 128,000 tokens | 0.58 s | 41 |
Llama 3.3 70B Fast | Nebius AI Studio | Meta | Llama 3.3 70B | $0.38 | $0.25 | $0.75 | 74.6 tokens/s | 128,000 tokens | 0.55 s | 41 |
Llama 3.3 70B Base | Nebius AI Studio | Meta | Llama 3.3 70B | $0.20 | $0.13 | $0.40 | 27.9 tokens/s | 128,000 tokens | 0.61 s | 41 |
Llama 3.3 70B | CentML | Meta | Llama 3.3 70B | $0.50 | $0.50 | $0.50 | 133.5 tokens/s | 128,000 tokens | 0.54 s | 41 |
Llama 3.3 70B | Microsoft Azure | Meta | Llama 3.3 70B | $0.71 | $0.71 | $0.71 | 61.2 tokens/s | 128,000 tokens | 0.45 s | 41 |
Llama 3.3 70B | Fireworks | Meta | Llama 3.3 70B | $0.90 | $0.90 | $0.90 | 120.2 tokens/s | 128,000 tokens | 0.5 s | 41 |
Llama 3.3 70B (Turbo, FP8) | Deepinfra (Turbo, FP8) | Meta | Llama 3.3 70B | $0.20 | $0.13 | $0.40 | 35.4 tokens/s | 128,000 tokens | 0.46 s | 41 |
Llama 3.3 70B | Deepinfra | Meta | Llama 3.3 70B | $0.27 | $0.23 | $0.40 | 21.9 tokens/s | 128,000 tokens | 0.42 s | 41 |
Llama 3.3 70B | FriendliAI | Meta | Llama 3.3 70B | $0.60 | $0.60 | $0.60 | 167.9 tokens/s | 128,000 tokens | 0.33 s | 41 |
Llama 3.3 70B | Novita | Meta | Llama 3.3 70B | $0.39 | $0.39 | $0.39 | 57.5 tokens/s | 128,000 tokens | 0.85 s | 41 |
Llama 3.3 70B (Spec decoding) | Groq | Meta | Llama 3.3 70B (Spec decoding) | $0.69 | $0.59 | $0.99 | 1603.3 tokens/s | 8,000 tokens | 0.4 s | 41 |
Llama 3.3 70B | Groq | Meta | Llama 3.3 70B | $0.64 | $0.59 | $0.79 | 275.4 tokens/s | 128,000 tokens | 0.14 s | 41 |
Llama 3.3 70B | SambaNova | Meta | Llama 3.3 70B | $0.75 | $0.60 | $1.20 | 304.3 tokens/s | 128,000 tokens | 0.51 s | 41 |
Llama 3.3 70B Turbo | Together.ai | Meta | Llama 3.3 70B | $0.88 | $0.88 | $0.88 | 174.9 tokens/s | 128,000 tokens | 0.59 s | 41 |
Llama 3.3 70B | kluster.ai | Meta | Llama 3.3 70B | $0.70 | $0.70 | $0.70 | 19.6 tokens/s | 128,000 tokens | 0.82 s | 41 |
GPT-4o (ChatGPT) | OpenAI | OpenAI | GPT-4o (ChatGPT) | $7.50 | $5.00 | $15.00 | 96.7 tokens/s | 128,000 tokens | 0.54 s | 41 |
GPT-4o (Aug '24) | OpenAI | OpenAI | GPT-4o (Aug '24) | $4.38 | $2.50 | $10.00 | 46.7 tokens/s | 128,000 tokens | 0.49 s | 41 |
GPT-4o (Aug '24) | Microsoft Azure | OpenAI | GPT-4o (Aug '24) | $4.38 | $2.50 | $10.00 | 121.3 tokens/s | 128,000 tokens | 0.65 s | 41 |
GPT-4o (May '24) | OpenAI | OpenAI | GPT-4o (May '24) | $7.50 | $5.00 | $15.00 | 44.3 tokens/s | 128,000 tokens | 0.5 s | 41 |
GPT-4o (May '24) | Microsoft Azure | OpenAI | GPT-4o (May '24) | $7.50 | $5.00 | $15.00 | 124.8 tokens/s | 128,000 tokens | 0.78 s | 41 |
Llama 3.1 405B | Replicate | Meta | Llama 3.1 405B | $9.50 | $9.50 | $9.50 | 18.9 tokens/s | 128,000 tokens | 0.39 s | 40 |
Llama 3.1 405B | Hyperbolic | Meta | Llama 3.1 405B | $4.00 | $4.00 | $4.00 | 7.1 tokens/s | 128,000 tokens | 0.86 s | 40 |
Llama 3.1 405B Latency Optimized | Amazon Bedrock Latency Optimized | Meta | Llama 3.1 405B | $3.00 | $3.00 | $3.00 | 63 tokens/s | 128,000 tokens | 0.73 s | 40 |
Llama 3.1 405B Base | Nebius AI Studio | Meta | Llama 3.1 405B | $1.50 | $1.00 | $3.00 | 34.6 tokens/s | 128,000 tokens | 0.69 s | 40 |
Llama 3.1 405B Vertex | Google Vertex | Meta | Llama 3.1 405B Vertex | $7.75 | $5.00 | $16.00 | 0 tokens/s | 128,000 tokens | 0.08 s | 40 |
Llama 3.1 405B | Microsoft Azure | Meta | Llama 3.1 405B | $8.00 | $5.33 | $16.00 | 31.8 tokens/s | 128,000 tokens | 0.5 s | 40 |
Llama 3.1 405B | Fireworks | Meta | Llama 3.1 405B | $3.00 | $3.00 | $3.00 | 64.2 tokens/s | 128,000 tokens | 0.8 s | 40 |
Llama 3.1 405B | Deepinfra | Meta | Llama 3.1 405B | $0.90 | $0.90 | $0.90 | 24 tokens/s | 33,000 tokens | 0.48 s | 40 |
Llama 3.1 405B | SambaNova | Meta | Llama 3.1 405B | $6.25 | $5.00 | $10.00 | 165.6 tokens/s | 16,000 tokens | 0.62 s | 40 |
Llama 3.1 405B | Databricks | Meta | Llama 3.1 405B | $7.50 | $5.00 | $15.00 | 36.5 tokens/s | 128,000 tokens | 0.71 s | 40 |
Llama 3.1 405B Turbo | Together.ai | Meta | Llama 3.1 405B | $3.50 | $3.50 | $3.50 | 26.3 tokens/s | 128,000 tokens | 0.64 s | 40 |
Llama 3.1 405B | kluster.ai | Meta | Llama 3.1 405B | $3.50 | $3.50 | $3.50 | 17.9 tokens/s | 128,000 tokens | 0.98 s | 40 |
Qwen2.5 72B | Hyperbolic | Alibaba | Qwen2.5 72B | $0.40 | $0.40 | $0.40 | 46.8 tokens/s | 131,000 tokens | 1.05 s | 40 |
Qwen2.5 72B Base | Nebius AI Studio | Alibaba | Qwen2.5 72B | $0.20 | $0.13 | $0.40 | 32.3 tokens/s | 131,000 tokens | 0.62 s | 40 |
Qwen2.5 72B Fast | Nebius AI Studio | Alibaba | Qwen2.5 72B | $0.38 | $0.25 | $0.75 | 70.1 tokens/s | 131,000 tokens | 0.55 s | 40 |
Qwen2.5 72B | Fireworks | Alibaba | Qwen2.5 72B | $0.90 | $0.90 | $0.90 | 44.3 tokens/s | 131,000 tokens | 0.37 s | 40 |
Qwen2.5 72B | Deepinfra | Alibaba | Qwen2.5 72B | $0.27 | $0.23 | $0.40 | 44.3 tokens/s | 33,000 tokens | 0.27 s | 40 |
Qwen2.5 72B | SambaNova | Alibaba | Qwen2.5 72B | $2.50 | $2.00 | $4.00 | 225.8 tokens/s | 16,000 tokens | 0.33 s | 40 |
Qwen2.5 72B Turbo | Together.ai | Alibaba | Qwen2.5 72B | $1.20 | $1.20 | $1.20 | 96.6 tokens/s | 131,000 tokens | 0.45 s | 40 |
Qwen2.5 72B | Alibaba Cloud | Alibaba | Qwen2.5 72B | $0.00 | $0.00 | $0.00 | 39.8 tokens/s | 131,000 tokens | 1.19 s | 40 |
Phi-4 | Nebius | Microsoft Azure | Phi-4 | $0.15 | $0.10 | $0.30 | 119.2 tokens/s | 16,000 tokens | 0.47 s | 40 |
Phi-4 | Deepinfra | Microsoft Azure | Phi-4 | $0.09 | $0.07 | $0.14 | 35.5 tokens/s | 16,000 tokens | 0.36 s | 40 |
Tulu3 405B | SambaNova | Allen Institute for AI | Tulu3 405B | $6.25 | $5.00 | $10.00 | 172.4 tokens/s | 16,000 tokens | 1.37 s | 40 |
MiniMax-Text-01 | MiniMax | MiniMax | MiniMax-Text-01 | $0.42 | $0.20 | $1.10 | 38.3 tokens/s | 1,000,000 tokens | 0.86 s | 40 |
Mistral Large 2 (Nov '24) | Mistral | Mistral | Mistral Large 2 (Nov '24) | $3.00 | $2.00 | $6.00 | 45.9 tokens/s | 128,000 tokens | 0.49 s | 38 |
Mistral Large 2 (Nov '24) | Microsoft Azure | Mistral | Mistral Large 2 (Nov '24) | $3.00 | $2.00 | $6.00 | 29.7 tokens/s | 128,000 tokens | 0.87 s | 38 |
Grok Beta | xAI | xAI | Grok Beta | $7.50 | $5.00 | $15.00 | 66.4 tokens/s | 128,000 tokens | 0.26 s | 38 |
Pixtral Large | Mistral | Mistral | Pixtral Large | $3.00 | $2.00 | $6.00 | 35 tokens/s | 128,000 tokens | 0.43 s | 37 |
Qwen2.5 Instruct 32B Fast | Nebius AI Studio | Alibaba | Qwen2.5 Instruct 32B | $0.00 | $0.00 | $0.00 | 86.2 tokens/s | 128,000 tokens | 0.54 s | 37 |
Qwen2.5 Instruct 32B Base | Nebius AI Studio | Alibaba | Qwen2.5 Instruct 32B | $0.00 | $0.00 | $0.00 | 60 tokens/s | 128,000 tokens | 0.55 s | 37 |
Qwen2.5 Instruct 32B | Groq | Alibaba | Qwen2.5 Instruct 32B | $0.79 | $0.79 | $0.79 | 198.2 tokens/s | 128,000 tokens | 0.22 s | 37 |
Llama 3.1 Nemotron 70B Base | Nebius AI Studio | NVIDIA | Llama 3.1 Nemotron 70B | $0.20 | $0.13 | $0.40 | 48.7 tokens/s | 128,000 tokens | 0.58 s | 37 |
Llama 3.1 Nemotron 70B Fast | Nebius AI Studio | NVIDIA | Llama 3.1 Nemotron 70B | $0.38 | $0.25 | $0.75 | 72.8 tokens/s | 128,000 tokens | 0.54 s | 37 |
Llama 3.1 Nemotron 70B | Deepinfra | NVIDIA | Llama 3.1 Nemotron 70B | $0.27 | $0.23 | $0.40 | 30 tokens/s | 128,000 tokens | 0.31 s | 37 |
Nova Pro | Amazon Bedrock | Amazon | Nova Pro | $1.40 | $0.80 | $3.20 | 80.1 tokens/s | 300,000 tokens | 0.37 s | 37 |
Mistral Large 2 (Jul '24) | Mistral | Mistral | Mistral Large 2 (Jul '24) | $3.00 | $2.00 | $6.00 | 31.6 tokens/s | 128,000 tokens | 0.48 s | 37 |
Mistral Large 2 (Jul '24) | Amazon Bedrock | Mistral | Mistral Large 2 (Jul '24) | $3.00 | $2.00 | $6.00 | 33.1 tokens/s | 128,000 tokens | 0.45 s | 37 |
Mistral Large 2 (Jul '24) | Microsoft Azure | Mistral | Mistral Large 2 (Jul '24) | $3.00 | $2.00 | $6.00 | 35.6 tokens/s | 128,000 tokens | 0.54 s | 37 |
Qwen2.5 Coder 32B | Hyperbolic | Alibaba | Qwen2.5 Coder 32B | $0.20 | $0.20 | $0.20 | 61.5 tokens/s | 131,000 tokens | 1.21 s | 36 |
Qwen2.5 Coder 32B | CentML | Alibaba | Qwen2.5 Coder 32B | $0.80 | $0.80 | $0.80 | 64.7 tokens/s | 131,000 tokens | 0.51 s | 36 |
Qwen2.5 Coder 32B | Fireworks | Alibaba | Qwen2.5 Coder 32B | $0.90 | $0.90 | $0.90 | 65.1 tokens/s | 33,000 tokens | 0.34 s | 36 |
Qwen2.5 Coder 32B | Deepinfra | Alibaba | Qwen2.5 Coder 32B | $0.10 | $0.08 | $0.18 | 49.4 tokens/s | 33,000 tokens | 0.52 s | 36 |
Qwen2.5 Coder 32B | Groq | Alibaba | Qwen2.5 Coder 32B | $0.79 | $0.79 | $0.79 | 384.5 tokens/s | 131,000 tokens | 0.35 s | 36 |
Qwen2.5 Coder 32B | SambaNova | Alibaba | Qwen2.5 Coder 32B | $1.88 | $1.50 | $3.00 | 325.1 tokens/s | 16,000 tokens | 0.29 s | 36 |
Qwen2.5 Coder 32B | Together.ai | Alibaba | Qwen2.5 Coder 32B | $0.80 | $0.80 | $0.80 | 77.7 tokens/s | 131,000 tokens | 0.53 s | 36 |
GPT-4o mini | OpenAI | OpenAI | GPT-4o mini | $0.26 | $0.15 | $0.60 | 95.4 tokens/s | 128,000 tokens | 0.51 s | 36 |
GPT-4o mini | Microsoft Azure | OpenAI | GPT-4o mini | $0.26 | $0.15 | $0.60 | 172.9 tokens/s | 128,000 tokens | 0.75 s | 36 |
Llama 3.1 70B | Hyperbolic | Meta | Llama 3.1 70B | $0.40 | $0.40 | $0.40 | 152.7 tokens/s | 128,000 tokens | 1 s | 35 |
Llama 3.1 70B Latency Optimized | Amazon Bedrock Latency Optimized | Meta | Llama 3.1 70B | $0.90 | $0.90 | $0.90 | 141.9 tokens/s | 128,000 tokens | 0.34 s | 35 |
Llama 3.1 70B Base | Nebius AI Studio | Meta | Llama 3.1 70B | $0.20 | $0.13 | $0.40 | 42.7 tokens/s | 128,000 tokens | 0.6 s | 35 |
Llama 3.1 70B Fast | Nebius AI Studio | Meta | Llama 3.1 70B | $0.38 | $0.25 | $0.75 | 147.5 tokens/s | 128,000 tokens | 0.53 s | 35 |
Llama 3.1 70B Vertex | Google Vertex | Meta | Llama 3.1 70B Vertex | $0.00 | $0.00 | $0.00 | 0 tokens/s | 128,000 tokens | 0.08 s | 35 |
Llama 3.1 70B | Fireworks | Meta | Llama 3.1 70B | $0.90 | $0.90 | $0.90 | 127.2 tokens/s | 128,000 tokens | 0.43 s | 35 |
Llama 3.1 70B (Turbo, FP8) | Deepinfra (Turbo, FP8) | Meta | Llama 3.1 70B | $0.20 | $0.13 | $0.40 | 35 tokens/s | 128,000 tokens | 0.38 s | 35 |
Llama 3.1 70B | Deepinfra | Meta | Llama 3.1 70B | $0.27 | $0.23 | $0.40 | 32.9 tokens/s | 128,000 tokens | 0.28 s | 35 |
Llama 3.1 70B | FriendliAI | Meta | Llama 3.1 70B | $0.60 | $0.60 | $0.60 | 192 tokens/s | 128,000 tokens | 0.32 s | 35 |
Llama 3.1 70B | Novita | Meta | Llama 3.1 70B | $0.35 | $0.34 | $0.39 | 84.9 tokens/s | 32,000 tokens | 0.95 s | 35 |
Llama 3.1 70B | SambaNova | Meta | Llama 3.1 70B | $0.75 | $0.60 | $1.20 | 318 tokens/s | 128,000 tokens | 0.51 s | 35 |
Llama 3.1 70B | Databricks | Meta | Llama 3.1 70B | $1.50 | $1.00 | $3.00 | 75.8 tokens/s | 128,000 tokens | 0.38 s | 35 |
Llama 3.1 70B Turbo | Together.ai | Meta | Llama 3.1 70B | $0.88 | $0.88 | $0.88 | 225.2 tokens/s | 128,000 tokens | 0.35 s | 35 |
Llama 3.1 70B | Simplismart | Meta | Llama 3.1 70B | $0.90 | $0.90 | $0.90 | 126.1 tokens/s | 128,000 tokens | 0.51 s | 35 |
Mistral Small 3 | Mistral | Mistral | Mistral Small 3 | $0.15 | $0.10 | $0.30 | 123.4 tokens/s | 32,000 tokens | 0.45 s | 35 |
Mistral Small 3 | Fireworks | Mistral | Mistral Small 3 | $0.90 | $0.90 | $0.90 | 38.5 tokens/s | 32,000 tokens | 0.49 s | 35 |
Mistral Small 3 | Deepinfra | Mistral | Mistral Small 3 | $0.09 | $0.07 | $0.14 | 81.4 tokens/s | 32,000 tokens | 0.48 s | 35 |
Mistral Small 3 | Together.ai | Mistral | Mistral Small 3 | $0.80 | $0.80 | $0.80 | 97.2 tokens/s | 32,000 tokens | 0.18 s | 35 |
Claude 3 Opus | Amazon Bedrock | Anthropic | Claude 3 Opus | $30.00 | $15.00 | $75.00 | 26.2 tokens/s | 200,000 tokens | 1.22 s | 35 |
Claude 3 Opus Vertex | Google Vertex | Anthropic | Claude 3 Opus Vertex | $30.00 | $15.00 | $75.00 | 26.3 tokens/s | 200,000 tokens | 2.3 s | 35 |
Claude 3 Opus | Anthropic | Anthropic | Claude 3 Opus | $30.00 | $15.00 | $75.00 | 27.1 tokens/s | 200,000 tokens | 1.16 s | 35 |
Claude 3.5 Haiku Vertex | Google Vertex | Anthropic | Claude 3.5 Haiku Vertex | $1.60 | $0.80 | $4.00 | 67.5 tokens/s | 200,000 tokens | 0.63 s | 35 |
Claude 3.5 Haiku | Anthropic | Anthropic | Claude 3.5 Haiku | $1.60 | $0.80 | $4.00 | 66.2 tokens/s | 200,000 tokens | 1.26 s | 35 |
DeepSeek R1 Distill Llama 8B | Novita | DeepSeek | DeepSeek R1 Distill Llama 8B | $0.04 | $0.04 | $0.04 | 54.2 tokens/s | 32,000 tokens | 9.07 s | 34 |
Gemini 1.5 Pro (May) (Vertex) | Google (Vertex) | Gemini 1.5 Pro (May) (Vertex) | $2.19 | $1.25 | $5.00 | 0 tokens/s | 2,000,000 tokens | 0.1 s | 34 | |
Gemini 1.5 Pro (May) (AI Studio) | Google (AI Studio) | Gemini 1.5 Pro (May) (AI Studio) | $2.19 | $1.25 | $5.00 | 65.9 tokens/s | 2,000,000 tokens | 0.43 s | 34 | |
Qwen Turbo | Alibaba Cloud | Alibaba | Qwen Turbo | $0.09 | $0.05 | $0.20 | 77.2 tokens/s | 1,000,000 tokens | 1.05 s | 34 |
Llama 3.2 90B (Vision) | Amazon Bedrock | Meta | Llama 3.2 90B (Vision) | $0.72 | $0.72 | $0.72 | 37.2 tokens/s | 128,000 tokens | 0.73 s | 33 |
Llama 3.2 90B (Vision) Vertex | Google Vertex | Meta | Llama 3.2 90B (Vision) Vertex | $0.00 | $0.00 | $0.00 | 0 tokens/s | 128,000 tokens | 0.08 s | 33 |
Llama 3.2 90B (Vision) | Fireworks | Meta | Llama 3.2 90B (Vision) | $0.90 | $0.90 | $0.90 | 41.1 tokens/s | 128,000 tokens | 0.36 s | 33 |
Llama 3.2 90B (Vision) | Deepinfra | Meta | Llama 3.2 90B (Vision) | $0.36 | $0.35 | $0.40 | 34.5 tokens/s | 33,000 tokens | 0.28 s | 33 |
Llama 3.2 90B (Vision) | Groq | Meta | Llama 3.2 90B (Vision) | $0.90 | $0.90 | $0.90 | 261.4 tokens/s | 8,000 tokens | 0.31 s | 33 |
Llama 3.2 90B (Vision) Turbo | Together.ai | Meta | Llama 3.2 90B (Vision) | $1.20 | $1.20 | $1.20 | 54.3 tokens/s | 128,000 tokens | 0.26 s | 33 |
Qwen2 72B | Together.ai | Alibaba | Qwen2 72B | $0.90 | $0.90 | $0.90 | 63.8 tokens/s | 33,000 tokens | 0.43 s | 33 |
Mistral Saba | Mistral | Mistral | Mistral Saba | $0.30 | $0.20 | $0.60 | 96.7 tokens/s | 32,000 tokens | 0.4 s | 32 |
Mistral Saba | Groq | Mistral | Mistral Saba | $0.79 | $0.79 | $0.79 | 381.2 tokens/s | 32,000 tokens | 0.33 s | 32 |
Jamba 1.5 Large | AI21 Labs | AI21 Labs | Jamba 1.5 Large | $3.50 | $2.00 | $8.00 | 60.5 tokens/s | 256,000 tokens | 0.51 s | 29 |
Jamba 1.5 Large | Microsoft Azure | AI21 Labs | Jamba 1.5 Large | $3.50 | $2.00 | $8.00 | 51 tokens/s | 256,000 tokens | 0.73 s | 29 |
Gemini 1.5 Flash (May) (Vertex) | Google (Vertex) | Gemini 1.5 Flash (May) (Vertex) | $0.13 | $0.07 | $0.30 | 0 tokens/s | 1,000,000 tokens | 0.1 s | 28 | |
Gemini 1.5 Flash (May) (AI Studio) | Google (AI Studio) | Gemini 1.5 Flash (May) (AI Studio) | $0.13 | $0.07 | $0.30 | 307.9 tokens/s | 1,000,000 tokens | 0.23 s | 28 | |
Nova Micro | Amazon Bedrock | Amazon | Nova Micro | $0.06 | $0.04 | $0.14 | 189.3 tokens/s | 130,000 tokens | 0.29 s | 28 |
Yi-Large | Fireworks | 01.AI | Yi-Large | $3.00 | $3.00 | $3.00 | 67.2 tokens/s | 32,000 tokens | 0.39 s | 28 |
Codestral (Jan '25) | Mistral | Mistral | Codestral (Jan '25) | $0.45 | $0.30 | $0.90 | 205.7 tokens/s | 256,000 tokens | 0.38 s | 28 |
Codestral (Jan '25) Vertex | Google Vertex | Mistral | Codestral (Jan '25) Vertex | $0.45 | $0.30 | $0.90 | 150.1 tokens/s | 128,000 tokens | 0.15 s | 28 |
Llama 3 70B | Replicate | Meta | Llama 3 70B | $1.18 | $0.65 | $2.75 | 46.5 tokens/s | 8,000 tokens | 0.37 s | 27 |
Llama 3 70B | Hyperbolic | Meta | Llama 3 70B | $0.40 | $0.40 | $0.40 | 82.4 tokens/s | 8,000 tokens | 1.09 s | 27 |
Llama 3 70B | Amazon Bedrock | Meta | Llama 3 70B | $2.86 | $2.65 | $3.50 | 54 tokens/s | 8,000 tokens | 0.43 s | 27 |
Llama 3 70B | Microsoft Azure | Meta | Llama 3 70B | $2.90 | $2.68 | $3.54 | 18.9 tokens/s | 8,000 tokens | 0.78 s | 27 |
Llama 3 70B | Fireworks | Meta | Llama 3 70B | $0.90 | $0.90 | $0.90 | 142 tokens/s | 8,000 tokens | 0.28 s | 27 |
Llama 3 70B | Deepinfra | Meta | Llama 3 70B | $0.27 | $0.23 | $0.40 | 50 tokens/s | 8,000 tokens | 0.56 s | 27 |
Llama 3 70B | Groq | Meta | Llama 3 70B | $0.64 | $0.59 | $0.79 | 339.7 tokens/s | 8,000 tokens | 0.23 s | 27 |
Llama 3 70B (Turbo, FP8) | Together.ai | Meta | Llama 3 70B | $0.88 | $0.88 | $0.88 | 57.4 tokens/s | 8,000 tokens | 0.47 s | 27 |
Mistral Small (Sep '24) | Mistral | Mistral | Mistral Small (Sep '24) | $0.30 | $0.20 | $0.60 | 71.6 tokens/s | 33,000 tokens | 0.41 s | 27 |
Mistral Large (Feb '24) | Mistral | Mistral | Mistral Large (Feb '24) | $6.00 | $4.00 | $12.00 | 37.8 tokens/s | 33,000 tokens | 0.44 s | 26 |
Mistral Large (Feb '24) | Amazon Bedrock | Mistral | Mistral Large (Feb '24) | $6.00 | $4.00 | $12.00 | 44.6 tokens/s | 33,000 tokens | 0.37 s | 26 |
Mistral Large (Feb '24) | Microsoft Azure | Mistral | Mistral Large (Feb '24) | $6.00 | $4.00 | $12.00 | 39.8 tokens/s | 33,000 tokens | 0.5 s | 26 |
Mixtral 8x22B | Mistral | Mistral | Mixtral 8x22B | $3.00 | $2.00 | $6.00 | 84.9 tokens/s | 65,000 tokens | 0.44 s | 26 |
Mixtral 8x22B Base | Nebius AI Studio | Mistral | Mixtral 8x22B | $0.60 | $0.40 | $1.20 | 91.6 tokens/s | 65,000 tokens | 0.5 s | 26 |
Mixtral 8x22B Fast | Nebius AI Studio | Mistral | Mixtral 8x22B | $1.05 | $0.70 | $2.10 | 108.6 tokens/s | 65,000 tokens | 0.52 s | 26 |
Mixtral 8x22B | Fireworks | Mistral | Mixtral 8x22B | $1.20 | $1.20 | $1.20 | 91.6 tokens/s | 65,000 tokens | 0.33 s | 26 |
Mixtral 8x22B | Together.ai | Mistral | Mixtral 8x22B | $1.20 | $1.20 | $1.20 | 70.9 tokens/s | 65,000 tokens | 0.81 s | 26 |
Qwen2.5 Coder 7B Fast | Nebius AI Studio | Alibaba | Qwen2.5 Coder 7B | $0.04 | $0.03 | $0.09 | 222.7 tokens/s | 131,000 tokens | 0.47 s | 26 |
Qwen2.5 Coder 7B Base | Nebius AI Studio | Alibaba | Qwen2.5 Coder 7B | $0.01 | $0.01 | $0.03 | 188.3 tokens/s | 131,000 tokens | 0.51 s | 26 |
Phi-3 Medium 14B | Microsoft Azure | Microsoft Azure | Phi-3 Medium 14B | $0.30 | $0.17 | $0.68 | 53.1 tokens/s | 128,000 tokens | 0.42 s | 25 |
DeepSeek Coder V2 Lite Fast, FP8 | Nebius AI Studio | DeepSeek | DeepSeek Coder V2 Lite | $0.12 | $0.08 | $0.24 | 117.6 tokens/s | 128,000 tokens | 0.56 s | 24 |
DeepSeek Coder V2 Lite Base, FP8 | Nebius AI Studio | DeepSeek | DeepSeek Coder V2 Lite | $0.06 | $0.04 | $0.12 | 113.1 tokens/s | 128,000 tokens | 0.58 s | 24 |
Mistral Medium | Mistral | Mistral | Mistral Medium | $4.09 | $2.75 | $8.10 | 44.7 tokens/s | 33,000 tokens | 0.43 s | 24 |
Llama 3.1 8B | Cerebras | Meta | Llama 3.1 8B | $0.10 | $0.10 | $0.10 | 2223.1 tokens/s | 33,000 tokens | 0.27 s | 24 |
Llama 3.1 8B | Hyperbolic | Meta | Llama 3.1 8B | $0.10 | $0.10 | $0.10 | 104.6 tokens/s | 128,000 tokens | 1 s | 24 |
Llama 3.1 8B | Amazon Bedrock | Meta | Llama 3.1 8B | $0.22 | $0.22 | $0.22 | 92.1 tokens/s | 128,000 tokens | 0.33 s | 24 |
Llama 3.1 8B Fast | Nebius AI Studio | Meta | Llama 3.1 8B | $0.04 | $0.03 | $0.09 | 183.9 tokens/s | 128,000 tokens | 0.5 s | 24 |
Llama 3.1 8B Base | Nebius AI Studio | Meta | Llama 3.1 8B | $0.03 | $0.02 | $0.06 | 66.6 tokens/s | 128,000 tokens | 0.53 s | 24 |
Llama 3.1 8B Vertex | Google Vertex | Meta | Llama 3.1 8B Vertex | $0.00 | $0.00 | $0.00 | 0 tokens/s | 128,000 tokens | 0.08 s | 24 |
Llama 3.1 8B | Microsoft Azure | Meta | Llama 3.1 8B | $0.38 | $0.30 | $0.61 | 213.4 tokens/s | 128,000 tokens | 0.24 s | 24 |
Llama 3.1 8B | Fireworks | Meta | Llama 3.1 8B | $0.20 | $0.20 | $0.20 | 232 tokens/s | 128,000 tokens | 0.23 s | 24 |
Llama 3.1 8B | Deepinfra | Meta | Llama 3.1 8B | $0.04 | $0.03 | $0.05 | 69.8 tokens/s | 128,000 tokens | 0.27 s | 24 |
Llama 3.1 8B | FriendliAI | Meta | Llama 3.1 8B | $0.10 | $0.10 | $0.10 | 491.8 tokens/s | 128,000 tokens | 0.25 s | 24 |
Llama 3.1 8B | Novita | Meta | Llama 3.1 8B | $0.05 | $0.05 | $0.05 | 65.1 tokens/s | 16,000 tokens | 0.6 s | 24 |
Llama 3.1 8B | Groq | Meta | Llama 3.1 8B | $0.06 | $0.05 | $0.08 | 751.5 tokens/s | 128,000 tokens | 0.17 s | 24 |
Llama 3.1 8B | SambaNova | Meta | Llama 3.1 8B | $0.13 | $0.10 | $0.20 | 1068.7 tokens/s | 16,000 tokens | 0.26 s | 24 |
Llama 3.1 8B Turbo | Together.ai | Meta | Llama 3.1 8B Turbo | $0.18 | $0.18 | $0.18 | 277.3 tokens/s | 128,000 tokens | 0.19 s | 24 |
Llama 3.1 8B | Simplismart | Meta | Llama 3.1 8B | $0.15 | $0.15 | $0.15 | 458.6 tokens/s | 128,000 tokens | 0.15 s | 24 |
Llama 3.1 8B | kluster.ai | Meta | Llama 3.1 8B | $0.18 | $0.18 | $0.18 | 14 tokens/s | 128,000 tokens | 0.39 s | 24 |
Pixtral 12B | Mistral | Mistral | Pixtral 12B | $0.15 | $0.15 | $0.15 | 102.3 tokens/s | 128,000 tokens | 0.43 s | 23 |
Pixtral 12B | Hyperbolic | Mistral | Pixtral 12B | $0.10 | $0.10 | $0.10 | 75.6 tokens/s | 128,000 tokens | 0.45 s | 23 |
Mistral Small (Feb '24) | Mistral | Mistral | Mistral Small (Feb '24) | $1.50 | $1.00 | $3.00 | 130 tokens/s | 33,000 tokens | 0.44 s | 23 |
Mistral Small (Feb '24) | Microsoft Azure | Mistral | Mistral Small (Feb '24) | $1.50 | $1.00 | $3.00 | 53.9 tokens/s | 33,000 tokens | 0.38 s | 23 |
Ministral 8B | Mistral | Mistral | Ministral 8B | $0.10 | $0.10 | $0.10 | 141.9 tokens/s | 128,000 tokens | 0.4 s | 22 |
Llama 3.2 11B (Vision) | Amazon Bedrock | Meta | Llama 3.2 11B (Vision) | $0.16 | $0.16 | $0.16 | 142.5 tokens/s | 128,000 tokens | 0.36 s | 22 |
Llama 3.2 11B (Vision) | CentML | Meta | Llama 3.2 11B (Vision) | $0.15 | $0.15 | $0.15 | 81.7 tokens/s | 128,000 tokens | 0.44 s | 22 |
Llama 3.2 11B (Vision) | Fireworks | Meta | Llama 3.2 11B (Vision) | $0.20 | $0.20 | $0.20 | 69.8 tokens/s | 128,000 tokens | 0.27 s | 22 |
Llama 3.2 11B (Vision) | Deepinfra | Meta | Llama 3.2 11B (Vision) | $0.06 | $0.06 | $0.06 | 50.5 tokens/s | 128,000 tokens | 0.25 s | 22 |
Llama 3.2 11B (Vision) | Groq | Meta | Llama 3.2 11B (Vision) | $0.18 | $0.18 | $0.18 | 751.3 tokens/s | 8,000 tokens | 0.18 s | 22 |
Llama 3.2 11B (Vision) Turbo | Together.ai | Meta | Llama 3.2 11B (Vision) | $0.18 | $0.18 | $0.18 | 142.8 tokens/s | 128,000 tokens | 0.19 s | 22 |
Command-R+ | Amazon Bedrock | Cohere | Command-R+ | $6.00 | $3.00 | $15.00 | 49.5 tokens/s | 128,000 tokens | 0.49 s | 21 |
Command-R+ | Cohere | Cohere | Command-R+ | $4.38 | $2.50 | $10.00 | 73 tokens/s | 128,000 tokens | 0.24 s | 21 |
Codestral (May '24) | Mistral | Mistral | Codestral (May '24) | $0.30 | $0.20 | $0.60 | 84.1 tokens/s | 33,000 tokens | 0.42 s | 20 |
Aya Expanse 32B | Cohere | Cohere | Aya Expanse 32B | $0.75 | $0.50 | $1.50 | 121.8 tokens/s | 128,000 tokens | 0.15 s | 20 |
Command-R+ (Apr '24) | Amazon Bedrock | Cohere | Command-R+ (Apr '24) | $6.00 | $3.00 | $15.00 | 46.9 tokens/s | 128,000 tokens | 0.49 s | 20 |
Command-R+ (Apr '24) | Cohere | Cohere | Command-R+ (Apr '24) | $6.00 | $3.00 | $15.00 | 78.1 tokens/s | 128,000 tokens | 0.23 s | 20 |
Command-R+ (Apr '24) | Microsoft Azure | Cohere | Command-R+ (Apr '24) | $6.00 | $3.00 | $15.00 | 50.7 tokens/s | 128,000 tokens | 0.58 s | 20 |
DBRX | Databricks | Databricks | DBRX | $1.13 | $0.75 | $2.25 | 68.7 tokens/s | 33,000 tokens | 0.47 s | 20 |
DBRX | Together.ai | Databricks | DBRX | $1.20 | $1.20 | $1.20 | 82.9 tokens/s | 33,000 tokens | 0.32 s | 20 |
Ministral 3B | Mistral | Mistral | Ministral 3B | $0.04 | $0.04 | $0.04 | 220.1 tokens/s | 128,000 tokens | 0.38 s | 20 |
Mistral NeMo | Mistral | Mistral | Mistral NeMo | $0.15 | $0.15 | $0.15 | 120.7 tokens/s | 128,000 tokens | 0.43 s | 20 |
Mistral NeMo Fast | Nebius AI Studio | Mistral | Mistral NeMo | $0.12 | $0.08 | $0.24 | 159.4 tokens/s | 128,000 tokens | 0.48 s | 20 |
Mistral NeMo Base | Nebius AI Studio | Mistral | Mistral NeMo | $0.06 | $0.04 | $0.12 | 31.5 tokens/s | 128,000 tokens | 0.64 s | 20 |
Mistral NeMo | Deepinfra | Mistral | Mistral NeMo | $0.06 | $0.04 | $0.10 | 54.1 tokens/s | 128,000 tokens | 0.32 s | 20 |
DeepSeek R1 Distill Qwen 1.5B | Together.ai | DeepSeek | DeepSeek R1 Distill Qwen 1.5B | $0.18 | $0.18 | $0.18 | 380.2 tokens/s | 128,000 tokens | 6.65 s | 19 |
Mixtral 8x7B | Mistral | Mistral | Mixtral 8x7B | $0.70 | $0.70 | $0.70 | 98 tokens/s | 33,000 tokens | 0.41 s | 17 |
Mixtral 8x7B | Amazon Bedrock | Mistral | Mixtral 8x7B | $0.51 | $0.45 | $0.70 | 78.8 tokens/s | 33,000 tokens | 0.33 s | 17 |
Mixtral 8x7B Fast | Nebius AI Studio | Mistral | Mixtral 8x7B | $0.23 | $0.15 | $0.45 | 164.4 tokens/s | 33,000 tokens | 0.5 s | 17 |
Mixtral 8x7B Base | Nebius AI Studio | Mistral | Mixtral 8x7B | $0.12 | $0.08 | $0.24 | 135 tokens/s | 33,000 tokens | 0.47 s | 17 |
Mixtral 8x7B | Fireworks | Mistral | Mixtral 8x7B | $0.50 | $0.50 | $0.50 | 153.2 tokens/s | 33,000 tokens | 0.26 s | 17 |
Mixtral 8x7B | Deepinfra | Mistral | Mixtral 8x7B | $0.24 | $0.24 | $0.24 | 109.5 tokens/s | 33,000 tokens | 0.46 s | 17 |
Mixtral 8x7B | Groq | Mistral | Mixtral 8x7B | $0.24 | $0.24 | $0.24 | 572.1 tokens/s | 33,000 tokens | 0.27 s | 17 |
Mixtral 8x7B | Databricks | Mistral | Mixtral 8x7B | $0.63 | $0.50 | $1.00 | 92.6 tokens/s | 33,000 tokens | 0.4 s | 17 |
Mixtral 8x7B | Together.ai | Mistral | Mixtral 8x7B | $0.60 | $0.60 | $0.60 | 102.9 tokens/s | 33,000 tokens | 0.38 s | 17 |
OpenChat 3.5 | Deepinfra | OpenChat | OpenChat 3.5 | $0.06 | $0.06 | $0.06 | 77.7 tokens/s | 8,000 tokens | 0.27 s | 16 |
Command-R | Cohere | Cohere | Command-R | $0.26 | $0.15 | $0.60 | 75.9 tokens/s | 128,000 tokens | 0.19 s | 15 |
Command-R (Mar '24) | Amazon Bedrock | Cohere | Command-R (Mar '24) | $0.75 | $0.50 | $1.50 | 105.3 tokens/s | 128,000 tokens | 0.34 s | 15 |
Command-R (Mar '24) | Cohere | Cohere | Command-R (Mar '24) | $0.75 | $0.50 | $1.50 | 173.7 tokens/s | 128,000 tokens | 0.15 s | 15 |
Command-R (Mar '24) | Microsoft Azure | Cohere | Command-R (Mar '24) | $0.75 | $0.50 | $1.50 | 81.6 tokens/s | 128,000 tokens | 0.46 s | 15 |
Codestral-Mamba | Mistral | Mistral | Codestral-Mamba | $0.25 | $0.25 | $0.25 | 96 tokens/s | 256,000 tokens | 0.56 s | 14 |
Mistral 7B | Mistral | Mistral | Mistral 7B | $0.25 | $0.25 | $0.25 | 131 tokens/s | 8,000 tokens | 0.36 s | 10 |
Mistral 7B | Deepinfra | Mistral | Mistral 7B | $0.04 | $0.03 | $0.06 | 84.5 tokens/s | 8,000 tokens | 0.21 s | 10 |
Mistral 7B | Novita | Mistral | Mistral 7B | $0.06 | $0.06 | $0.06 | 99.4 tokens/s | 32,000 tokens | 0.91 s | 10 |
Mistral 7B | Together.ai | Mistral | Mistral 7B | $0.20 | $0.20 | $0.20 | 167 tokens/s | 8,000 tokens | 0.21 s | 10 |
Llama 2 Chat 7B | Replicate | Meta | Llama 2 Chat 7B | $0.10 | $0.05 | $0.25 | 122.6 tokens/s | 4,000 tokens | 0.59 s | 8 |
o1-preview | OpenAI | OpenAI | o1-preview | $26.25 | $15.00 | $60.00 | 125 tokens/s | 128,000 tokens | 22.96 s | N/A |
o1-preview | Microsoft Azure | OpenAI | o1-preview | $28.88 | $16.50 | $66.00 | 136.2 tokens/s | 128,000 tokens | 26.78 s | N/A |
Llama 3.2 3B | Hyperbolic | Meta | Llama 3.2 3B | $0.10 | $0.10 | $0.10 | 175.1 tokens/s | 128,000 tokens | 0.88 s | N/A |
Llama 3.2 3B | Amazon Bedrock | Meta | Llama 3.2 3B | $0.15 | $0.15 | $0.15 | 73 tokens/s | 128,000 tokens | 0.31 s | N/A |
Llama 3.2 3B Base | Nebius AI Studio | Meta | Llama 3.2 3B | $0.01 | $0.01 | $0.02 | 122.5 tokens/s | 128,000 tokens | 0.52 s | N/A |
Llama 3.2 3B | Fireworks | Meta | Llama 3.2 3B | $0.10 | $0.10 | $0.10 | 155.6 tokens/s | 128,000 tokens | 0.19 s | N/A |
Llama 3.2 3B | Deepinfra | Meta | Llama 3.2 3B | $0.02 | $0.02 | $0.03 | 155.2 tokens/s | 128,000 tokens | 0.45 s | N/A |
Llama 3.2 3B | Novita | Meta | Llama 3.2 3B | $0.04 | $0.03 | $0.05 | 94.7 tokens/s | 32,000 tokens | 0.58 s | N/A |
Llama 3.2 3B | Groq | Meta | Llama 3.2 3B | $0.06 | $0.06 | $0.06 | 1544.5 tokens/s | 8,000 tokens | 0.33 s | N/A |
Llama 3.2 3B | SambaNova | Meta | Llama 3.2 3B | $0.10 | $0.08 | $0.16 | 1549 tokens/s | 8,000 tokens | 0.28 s | N/A |
Llama 3.2 3B Turbo | Together.ai | Meta | Llama 3.2 3B | $0.06 | $0.06 | $0.06 | 63.7 tokens/s | 128,000 tokens | 0.37 s | N/A |
Llama 3.2 1B | Amazon Bedrock | Meta | Llama 3.2 1B | $0.10 | $0.10 | $0.10 | 122.3 tokens/s | 128,000 tokens | 0.34 s | N/A |
Llama 3.2 1B Base | Nebius AI Studio | Meta | Llama 3.2 1B | $0.01 | $0.01 | $0.01 | 267.1 tokens/s | 128,000 tokens | 0.49 s | N/A |
Llama 3.2 1B | Deepinfra | Meta | Llama 3.2 1B | $0.01 | $0.01 | $0.02 | 136.3 tokens/s | 128,000 tokens | 0.25 s | N/A |
Llama 3.2 1B | Groq | Meta | Llama 3.2 1B | $0.04 | $0.04 | $0.04 | 3114.8 tokens/s | 8,000 tokens | 0.49 s | N/A |
Llama 3.2 1B | SambaNova | Meta | Llama 3.2 1B | $0.05 | $0.04 | $0.08 | 2539.8 tokens/s | 16,000 tokens | 0.94 s | N/A |
Gemini 2.0 Flash (exp) (AI Studio) | Google (AI Studio) | Gemini 2.0 Flash (exp) (AI Studio) | $0.00 | $0.00 | $0.00 | 174.6 tokens/s | 1,000,000 tokens | 0.26 s | N/A | |
Gemini 1.5 Flash (Sep) (Vertex) | Google Vertex | Gemini 1.5 Flash (Sep) (Vertex) | $0.13 | $0.07 | $0.30 | 0 tokens/s | 1,000,000 tokens | 0.1 s | N/A | |
Gemini 1.5 Flash (Sep) (AI Studio) | Google (AI Studio) | Gemini 1.5 Flash (Sep) (AI Studio) | $0.13 | $0.07 | $0.30 | 180.5 tokens/s | 1,000,000 tokens | 0.29 s | N/A | |
Gemma 2 27B | Together.ai | Gemma 2 27B | $0.80 | $0.80 | $0.80 | 81.8 tokens/s | 8,000 tokens | 0.45 s | N/A | |
Gemma 2 9B Base | Nebius AI Studio | Gemma 2 9B | $0.03 | $0.02 | $0.06 | 106.1 tokens/s | 8,000 tokens | 0.82 s | N/A | |
Gemma 2 9B | Deepinfra | Gemma 2 9B | $0.04 | $0.03 | $0.06 | 46.9 tokens/s | 8,000 tokens | 0.35 s | N/A | |
Gemma 2 9B | Groq | Gemma 2 9B | $0.20 | $0.20 | $0.20 | 662.9 tokens/s | 8,000 tokens | 0.23 s | N/A | |
Gemma 2 9B | Together.ai | Gemma 2 9B | $0.30 | $0.30 | $0.30 | 133 tokens/s | 8,000 tokens | 0.24 s | N/A | |
Gemini 1.5 Flash-8B AI Studio | Google AI Studio | Gemini 1.5 Flash-8B AI Studio | $0.07 | $0.04 | $0.15 | 276.4 tokens/s | 1,000,000 tokens | 0.19 s | N/A | |
Claude 3.5 Sonnet (June) | Anthropic | Anthropic | Claude 3.5 Sonnet (June) | $6.00 | $3.00 | $15.00 | 80.6 tokens/s | 200,000 tokens | 0.77 s | N/A |
Claude 3 Haiku | Anthropic | Anthropic | Claude 3 Haiku | $0.50 | $0.25 | $1.25 | 131.7 tokens/s | 200,000 tokens | 0.58 s | N/A |
Aya Expanse 8B | Cohere | Cohere | Aya Expanse 8B | $0.75 | $0.50 | $1.50 | 166 tokens/s | 8,000 tokens | 0.21 s | N/A |
Jamba 1.5 Mini | AI21 Labs | AI21 Labs | Jamba 1.5 Mini | $0.25 | $0.20 | $0.40 | 183.5 tokens/s | 256,000 tokens | 0.3 s | N/A |
Jamba 1.5 Mini | Microsoft Azure | AI21 Labs | Jamba 1.5 Mini | $0.25 | $0.20 | $0.40 | 80.9 tokens/s | 256,000 tokens | 0.49 s | N/A |
GPT-4 Turbo | OpenAI | OpenAI | GPT-4 Turbo | $15.00 | $10.00 | $30.00 | 49.2 tokens/s | 128,000 tokens | 0.52 s | N/A |
Llama 3 8B | Replicate | Meta | Llama 3 8B | $0.10 | $0.05 | $0.25 | 73 tokens/s | 8,000 tokens | 0.39 s | N/A |
Llama 3 8B | Amazon Bedrock | Meta | Llama 3 8B | $0.38 | $0.30 | $0.60 | 103.3 tokens/s | 8,000 tokens | 0.3 s | N/A |
Llama 3 8B | Microsoft Azure | Meta | Llama 3 8B | $0.38 | $0.30 | $0.61 | 73.5 tokens/s | 8,000 tokens | 0.36 s | N/A |
Llama 3 8B | Fireworks | Meta | Llama 3 8B | $0.20 | $0.20 | $0.20 | 126 tokens/s | 8,000 tokens | 0.23 s | N/A |
Llama 3 8B | Deepinfra | Meta | Llama 3 8B | $0.04 | $0.03 | $0.06 | 112.4 tokens/s | 8,000 tokens | 0.17 s | N/A |
Llama 3 8B | Novita | Meta | Llama 3 8B | $0.04 | $0.04 | $0.04 | 43.3 tokens/s | 8,000 tokens | 1.19 s | N/A |
Llama 3 8B | Groq | Meta | Llama 3 8B | $0.06 | $0.05 | $0.08 | 1198.6 tokens/s | 8,000 tokens | 0.34 s | N/A |
Gemini 1.0 Pro Vertex | Google Vertex | Gemini 1.0 Pro Vertex | $0.19 | $0.13 | $0.38 | 0 tokens/s | 33,000 tokens | 0.15 s | N/A | |
Claude 3 Sonnet | Amazon Bedrock | Anthropic | Claude 3 Sonnet | $6.00 | $3.00 | $15.00 | 62.4 tokens/s | 200,000 tokens | 0.68 s | N/A |
Claude 3 Sonnet | Anthropic | Anthropic | Claude 3 Sonnet | $6.00 | $3.00 | $15.00 | 58 tokens/s | 200,000 tokens | 0.54 s | N/A |
Claude 2.1 | Amazon Bedrock | Anthropic | Claude 2.1 | $12.00 | $8.00 | $24.00 | 28.9 tokens/s | 200,000 tokens | 1.9 s | N/A |
Claude 2.1 | Anthropic | Anthropic | Claude 2.1 | $12.00 | $8.00 | $24.00 | 13.3 tokens/s | 200,000 tokens | 0.82 s | N/A |
Claude 2.0 | Anthropic | Anthropic | Claude 2.0 | $12.00 | $8.00 | $24.00 | 29.3 tokens/s | 100,000 tokens | 0.82 s | N/A |
Jamba Instruct | AI21 Labs | AI21 Labs | Jamba Instruct | $0.55 | $0.50 | $0.70 | 184 tokens/s | 256,000 tokens | 0.29 s | N/A |
Jamba Instruct | Microsoft Azure | AI21 Labs | Jamba Instruct | $0.55 | $0.50 | $0.70 | 76.9 tokens/s | 256,000 tokens | 0.52 s | N/A |
Showing 301 of 303 models
About This Tool
This interactive tool helps you compare different LLM providers and models based on various metrics like price, performance, and capabilities.
Data is sourced from artificialanalysis.ai and is updated regularly to reflect the latest information available.
Use the filters and chart configuration options to customize your view and find the perfect LLM for your specific needs.