Delegated Influence

A competitive multi-agent benchmark for LLM persuasion: the only way to score is to get other agents to spend their scarce actions on you.

287958d · generated 2026-07-03 · 40 episodes · private draft — not for citation

Experiment

Background check

Experiment 2 — background check: does the filler model reliably produce valid moves? Validates glm-5.2 (the new background) before anything expensive runs. Expectation: glm-5.2 forfeits under 3% of turns; in the all-glm baseline, seat labels (P1..P5) score within noise of each other — any seat-label spread is a design artifact.

status
planned
coverage
0 / 11 episodes
conditions
pure economy; messages on, pure economy; messages on
config
configs/02_background_check.yaml

Planned — 11 episodes from 02_background_check.yaml.

image/svg+xml Matplotlib v3.11.0, https://matplotlib.org/ P L A N N E D 11 episodes · 02_background_check.yaml

Episodes

No episodes yet — launch with:

uv run python -m delegated_influence.run configs/02_background_check.yaml