← Back to Vault

Agentic Coding Benchmark

Cameron Rohn · Category: frameworks_and_exercises

Use the 'agentic coding benchmark' (SW bench verified at 82% success) and 'terminal bench' metrics (50% on Sonnet 4.5 vs 36% previously) to measure autonomous code execution capabilities of LLMs.