Phipps Model Evaluation Scorecard

PE-Domain Quality Assessment — 17 Models, 12 Tests, 63 Criteria
Last run: 2026-02-25  |  v3 harness (12 tests) | Bessemer-graded

Leaderboard

Value Frontier — Quality vs. Cost (log)

PE Skills — v3 Head-to-Head

Test Breakdown — Full-Suite Models

Test Details