OpenAI

GPT-5.4-mini

Status: provisional

Summary stats

#3 BT rank
1043.620 BT score
28 Mirror samples
9-3-16 W-L-T

95% confidence interval: 1008.338 to 1076.047.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20, conservative_signal_not_separated_adjacent

Head-to-head results

vs GPT-OSS-20B

9-1-0

10 raw games

BT edge: 0.900 (0.596 to 0.982)

vs GPT-OSS-120B

7-3-0

10 raw games

BT edge: 0.700 (0.397 to 0.892)

vs GPT-5.4-nano

6-4-0

10 raw games

BT edge: 0.600 (0.313 to 0.832)

vs GLM-5

4-4-0

8 raw games

BT edge: 0.500 (0.215 to 0.785)