Experiment
Calibration
Exploratory run: 25 episodes, complete/pure/msg-on.
- status
- exploratory
- coverage
- 25 episodes
- conditions
- complete/pure/msg-on
25 episodes; mean focal-model capture beyond tit-for-tat = +0.09.
one slim bar per episode; the oxblood line is the mean
Episodes
| episode | condition | focal model | capture (by focal) | cascades | gini |
|---|---|---|---|---|---|
| calibration--gemma-3-4b_r0 | complete/pure/msg-on | gemma-3-4b-it | 0.268 | 1 | 0.59 |
| calibration--gemma-3-4b_r1 | complete/pure/msg-on | gemma-3-4b-it | 0.154 | 0 | 0.494 |
| calibration--gemma-3-4b_r2 | complete/pure/msg-on | gemma-3-4b-it | 0.142 | 3 | 0.464 |
| calibration--gemma-3-4b_r3 | complete/pure/msg-on | gemma-3-4b-it | 0.0489 | 1 | 0.416 |
| calibration--gemma-3-4b_r4 | complete/pure/msg-on | gemma-3-4b-it | 0.00401 | 2 | 0.333 |
| calibration--gpt-5.4-mini_r0 | complete/pure/msg-on | gpt-5.4-mini | 0.0398 | 3 | 0.162 |
| calibration--gpt-5.4-mini_r1 | complete/pure/msg-on | gpt-5.4-mini | -0.0193 | 0 | 0.159 |
| calibration--gpt-5.4-mini_r2 | complete/pure/msg-on | gpt-5.4-mini | 0.0168 | 0 | 0.187 |
| calibration--gpt-5.4-mini_r3 | complete/pure/msg-on | gpt-5.4-mini | 0.0146 | 2 | 0.157 |
| calibration--gpt-5.4-mini_r4 | complete/pure/msg-on | gpt-5.4-mini | -0.00468 | 0 | 0.198 |
| calibration--opus-4.8_r0 | complete/pure/msg-on | opus-4.8 | 0.0931 | 1 | 0.426 |
| calibration--opus-4.8_r1 | complete/pure/msg-on | opus-4.8 | 0.0814 | 3 | 0.309 |
| calibration--opus-4.8_r2 | complete/pure/msg-on | opus-4.8 | 0.0407 | 0 | 0.329 |
| calibration--opus-4.8_r3 | complete/pure/msg-on | opus-4.8 | 0.118 | 0 | 0.327 |
| calibration--opus-4.8_r4 | complete/pure/msg-on | opus-4.8 | 0.18 | 0 | 0.473 |
| calibration--qwen3-235b-thinking_r0 | complete/pure/msg-on | qwen3-235b-thinking | 0.0378 | 0 | 0.347 |
| calibration--qwen3-235b-thinking_r1 | complete/pure/msg-on | qwen3-235b-thinking | 0.0392 | 2 | 0.366 |
| calibration--qwen3-235b-thinking_r2 | complete/pure/msg-on | qwen3-235b-thinking | -0.0135 | 2 | 0.307 |
| calibration--qwen3-235b-thinking_r3 | complete/pure/msg-on | qwen3-235b-thinking | 0.163 | 0 | 0.495 |
| calibration--qwen3-235b-thinking_r4 | complete/pure/msg-on | qwen3-235b-thinking | 0.00106 | 2 | 0.375 |
| calibration--sonnet-4.6_r0 | complete/pure/msg-on | sonnet-4.6 | 0.18 | 2 | 0.627 |
| calibration--sonnet-4.6_r1 | complete/pure/msg-on | sonnet-4.6 | 0.187 | 3 | 0.598 |
| calibration--sonnet-4.6_r2 | complete/pure/msg-on | sonnet-4.6 | 0.172 | 3 | 0.585 |
| calibration--sonnet-4.6_r3 | complete/pure/msg-on | sonnet-4.6 | 0.17 | 2 | 0.593 |
| calibration--sonnet-4.6_r4 | complete/pure/msg-on | sonnet-4.6 | 0.129 | 3 | 0.436 |
25 episodes, sorted by condition then id; episode links open the transcript reader.