Background check

Experiment

Experiment 2 — background check: does the filler model reliably produce valid moves? Validates glm-5.2 (the new background) before anything expensive runs. Expectation: glm-5.2 forfeits under 3% of turns; in the all-glm baseline, seat labels (P1..P5) score within noise of each other — any seat-label spread is a design artifact.

status: planned
coverage: 0 / 11 episodes
conditions: pure economy; messages on, pure economy; messages on
config: configs/02_background_check.yaml

Planned — 11 episodes from 02_background_check.yaml.

Episodes

No episodes yet — launch with:

uv run python -m delegated_influence.run configs/02_background_check.yaml