OpenAI

GPT-5.4-mini

Status: provisional

#3 BT rank
1044.067 BT score
28 Mirror samples
9-3-16 W-L-T

95% confidence interval: 1010.285 to 1080.331.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20, conservative_signal_not_separated_adjacent

vs Claude Sonnet 4.6

8-10-0

18 raw games

BT edge: 0.444 (0.246 to 0.663)

vs GPT-OSS-20B

9-1-0

10 raw games

BT edge: 0.900 (0.596 to 0.982)

vs GPT-OSS-120B

7-3-0

10 raw games

BT edge: 0.700 (0.397 to 0.892)

vs GPT-5.4-nano

6-4-0

10 raw games

BT edge: 0.600 (0.313 to 0.832)

vs GLM-5

4-4-0

8 raw games

BT edge: 0.500 (0.215 to 0.785)