← Back to Vault

Utility Over Benchmarks

Tom Spencer · Category: points_of_view

Prioritize LLM evaluation based on real-world task completion and cost efficiency rather than solely on benchmark scores.