OpenAI

GPT-OSS-120B

Status: provisional

Summary stats

#8 BT rank
983.118 BT score
90 Mirror samples
23-24-43 W-L-T

95% confidence interval: 961.369 to 1009.215.

Current status

Why this row is not ranked yet

samples_holdout<20, conservative_signal_not_separated_adjacent

Head-to-head results

vs GPT-OSS-20B

31-3-0

34 raw games

BT edge: 0.912 (0.770 to 0.970)

vs GPT-5.4-nano

9-21-0

30 raw games

BT edge: 0.300 (0.167 to 0.479)

vs GLM-5

11-17-0

28 raw games

BT edge: 0.393 (0.236 to 0.576)

vs GPT-5-mini

7-5-0

12 raw games

BT edge: 0.583 (0.320 to 0.807)

vs GPT-5.4-mini

3-7-0

10 raw games

BT edge: 0.300 (0.108 to 0.603)