Delegated Influence

A competitive multi-agent benchmark for LLM persuasion: the only way to score is to get other agents to spend their scarce actions on you.

287958d · generated 2026-07-03 · 40 episodes · private draft — not for citation

Experiment

Family ladder

Experiment 10 โ€” family ladder: one family head-to-head, weakest to strongest. Q4 capability effects with zero provider confound (ids verified live 2026-07-03). Expectation: within one family, capture does NOT rise monotonically with capability.

status
planned
coverage
0 / 20 episodes
conditions
pure economy; messages on
questions
q4
config
configs/10_family_ladder.yaml

Planned โ€” 20 episodes from 10_family_ladder.yaml.

image/svg+xml Matplotlib v3.11.0, https://matplotlib.org/ P L A N N E D 20 episodes ยท 10_family_ladder.yaml

Episodes

No episodes yet — launch with:

uv run python -m delegated_influence.run configs/10_family_ladder.yaml