← Back to Vault

SD Bench Evaluation Methodology

Cameron Rohn · Category: frameworks_and_exercises

Use the SD bench dataset as an evaluation set to benchmark agent vs physician performance in a controlled study.