OpenAI

GPT-OSS-120B

Status: provisional

#8 BT rank
980.575 BT score
78 Mirror samples
21-22-35 W-L-T

95% confidence interval: 955.023 to 1006.509.

Current status

Why this row is not ranked yet

samples_holdout<20, conservative_signal_not_separated_adjacent

vs GPT-OSS-20B

31-3-0

34 raw games

BT edge: 0.912 (0.770 to 0.970)

vs GPT-5.4-nano

9-21-0

30 raw games

BT edge: 0.300 (0.167 to 0.479)

vs GLM-5

11-17-0

28 raw games

BT edge: 0.393 (0.236 to 0.576)

vs Qwen3-235B-A22B

10-4-0

14 raw games

BT edge: 0.714 (0.454 to 0.883)

vs GPT-5-mini

7-5-0

12 raw games

BT edge: 0.583 (0.320 to 0.807)

vs Claude Sonnet 4.6

2-10-0

12 raw games

BT edge: 0.167 (0.047 to 0.448)

vs GPT-5.2 (medium)

2-10-0

12 raw games

BT edge: 0.167 (0.047 to 0.448)

vs GPT-5.4-mini

3-7-0

10 raw games

BT edge: 0.300 (0.108 to 0.603)

vs GPT-5-mini (high)

2-2-0

4 raw games

BT edge: 0.500 (0.150 to 0.850)