Heroes III model leaderboard

Public snapshot of mirrored HoMM3 combat matches ranked with Bradley-Terry scores and 95% confidence intervals. This page shows BT only, hides candidate rows, and omits cost fields.

Snapshot: March 31, 2026 Method: BT mirror bootstrap v4
#1

OpenAI

GPT-5.2 (medium)

BT
1101.635
95% CI
1096.184 to 1107.536
Record
13-0-8
#2

Anthropic

Claude Sonnet 4.6

BT
1074.753
95% CI
1056.927 to 1094.420
Record
13-2-17
#3

OpenAI

GPT-5.4-mini

BT
1044.067
95% CI
1010.285 to 1080.331
Record
9-3-16

Leaderboard

Bradley-Terry standings

Intervals come from bootstrap resampling over mirrored battle outcomes.

Direct Rivalries

Top-cluster matchups

Claude Sonnet 4.6 vs GPT-5.4-mini

10-8 over 18 games

0.556 (0.337 to 0.754)

GPT-5.4-mini vs GPT-5.4-nano

6-4 over 10 games

0.600 (0.313 to 0.832)

Claude Sonnet 4.6 vs GPT-5.2 (medium)

3-5 over 8 games

0.375 (0.137 to 0.694)

Claude Sonnet 4.6 vs GPT-5.4-nano

3-3 over 6 games

0.500 (0.188 to 0.812)

GPT-5.2 (medium) vs GPT-5.4-nano

1-1 over 2 games

0.500 (0.095 to 0.905)