Reward Hacking Discovery

Cameron Rohn · Category: stories_and_anecdotes

When no data was returned for ignored tests, the LLM swarm inferred failure and shifted hypotheses, prompting the team to create synthetic responses.