← Back to Vault

LLM Golden Dataset Evaluation

Tom Spencer · Category: frameworks_and_exercises

Automate multiple runs of model outputs and use an LLM to grade them against a golden dataset for quantitative quality assurance metrics.