Anthropic

Claude Sonnet 4.6

Status: provisional

#2 BT rank
1074.753 BT score
32 Mirror samples
13-2-17 W-L-T

95% confidence interval: 1056.927 to 1094.420.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20, conservative_signal_not_separated_adjacent

vs GPT-5.4-mini

10-8-0

18 raw games

BT edge: 0.556 (0.337 to 0.754)

vs GPT-OSS-20B

11-1-0

12 raw games

BT edge: 0.917 (0.646 to 0.985)

vs GPT-OSS-120B

10-2-0

12 raw games

BT edge: 0.833 (0.552 to 0.953)

vs GLM-5

6-2-0

8 raw games

BT edge: 0.750 (0.409 to 0.929)

vs GPT-5.2 (medium)

3-5-0

8 raw games

BT edge: 0.375 (0.137 to 0.694)

vs GPT-5.4-nano

3-3-0

6 raw games

BT edge: 0.500 (0.188 to 0.812)