Experiment
Background check
Experiment 2 — background check: does the filler model reliably produce valid moves? Validates glm-5.2 (the new background) before anything expensive runs. Expectation: glm-5.2 forfeits under 3% of turns; in the all-glm baseline, seat labels (P1..P5) score within noise of each other — any seat-label spread is a design artifact.
- status
- planned
- coverage
- 0 / 11 episodes
- conditions
- pure economy; messages on, pure economy; messages on
- config
- configs/02_background_check.yaml
Planned — 11 episodes from 02_background_check.yaml.
Episodes
No episodes yet — launch with:
uv run python -m delegated_influence.run configs/02_background_check.yaml