| Model | Composite | Reason |
|---|---|---|
| Claude Opus 4.6 | 4.46 | Too expensive for production ($5.00/$25.00 per MTok) |
| Claude Sonnet 4.6 | 4.33 | Too expensive for production |
| GPT-5.4 | 4.05 | Dominated by GPT-5.4 Mini on quality-per-dollar |
| Kimi K2.5 | 3.93 | Removed from active evaluation |
| MiniMax M2.7 | 4.18 | Superseded by M2.7 HighSpeed |
| Gemini 2.5 Flash | 3.80 | Superseded by Gemini 3.1 Flash Lite |
| Gemini 3 Flash Preview | 3.81 | Superseded by Gemini 3.1 Flash Lite |
| Step 3.5 Flash | 3.34 | Removed from active evaluation |
| GLM-5 Turbo | 3.99 | Redundant with GLM-5 at higher cost |
| GLM-5 | 3.93 | $73/mo, mid-pack quality |
| DeepSeek R1 | 3.65 | Reasoning tokens inflate cost without PE benefit |
| Llama 4 Maverick | 3.48 | Lowest composite score |