Anthropic

Claude Sonnet 4.6

Status: provisional

Summary stats

#2 BT rank
1073.069 BT score
32 Mirror samples
13-2-17 W-L-T

95% confidence interval: 1055.682 to 1090.217.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20, conservative_signal_not_separated_adjacent

Head-to-head results

vs GPT-5.4-mini

10-8-0

18 raw games

BT edge: 0.556 (0.337 to 0.754)

vs GPT-OSS-20B

11-1-0

12 raw games

BT edge: 0.917 (0.646 to 0.985)

vs GPT-OSS-120B

10-2-0

12 raw games

BT edge: 0.833 (0.552 to 0.953)

vs GLM-5

6-2-0

8 raw games

BT edge: 0.750 (0.409 to 0.929)

vs GPT-5.4-nano

3-3-0

6 raw games

BT edge: 0.500 (0.188 to 0.812)