| Claude Opus 4.6 |
4.46 |
Too expensive for production ($5.00/$25.00 per MTok) |
| Claude Sonnet 4.6 |
4.33 |
Too expensive for production |
| GPT-5.4 |
4.05 |
Dominated by GPT-5.4 Mini on quality-per-dollar |
| Kimi K2.5 |
3.93 |
Removed from active evaluation |
| MiniMax M2.7 |
4.18 |
Superseded by M2.7 HighSpeed |
| Gemini 2.5 Flash |
3.80 |
Superseded by Gemini 3.1 Flash Lite |
| Gemini 3 Flash Preview |
3.81 |
Superseded by Gemini 3.1 Flash Lite |
| Step 3.5 Flash |
3.34 |
Removed from active evaluation |
| GLM-5 Turbo |
3.99 |
Redundant with GLM-5 at higher cost |
| GLM-5 |
3.93 |
$73/mo, mid-pack quality |
| DeepSeek R1 |
3.65 |
Reasoning tokens inflate cost without PE benefit |
| Llama 4 Maverick |
3.48 |
Lowest composite score |