OpenAI
GPT-OSS-120B
Status: provisional
Summary stats
95% confidence interval: 961.369 to 1009.215.
Current status
Why this row is not ranked yet
samples_holdout<20, conservative_signal_not_separated_adjacent
Head-to-head results
vs GPT-OSS-20B
31-3-0
34 raw games
BT edge: 0.912 (0.770 to 0.970)
vs GPT-5.4-nano
9-21-0
30 raw games
BT edge: 0.300 (0.167 to 0.479)
vs GLM-5
11-17-0
28 raw games
BT edge: 0.393 (0.236 to 0.576)
vs Nemotron-3 Super 120B (free)
12-12-0
24 raw games
BT edge: 0.500 (0.314 to 0.686)
10-4-0
14 raw games
BT edge: 0.714 (0.454 to 0.883)
vs GPT-5-mini
7-5-0
12 raw games
BT edge: 0.583 (0.320 to 0.807)
2-10-0
12 raw games
BT edge: 0.167 (0.047 to 0.448)
2-10-0
12 raw games
BT edge: 0.167 (0.047 to 0.448)
vs GPT-5.4-mini
3-7-0
10 raw games
BT edge: 0.300 (0.108 to 0.603)
2-2-0
4 raw games
BT edge: 0.500 (0.150 to 0.850)