OpenAI
GPT-OSS-20B
Status: provisional
Summary stats
95% confidence interval: 811.845 to 876.218.
Current status
Why this row is not ranked yet
samples_holdout<20
Head-to-head results
vs GPT-OSS-120B
3-31-0
34 raw games
BT edge: 0.088 (0.030 to 0.230)
vs Nemotron-3 Super 120B (free)
10-14-0
24 raw games
BT edge: 0.417 (0.245 to 0.612)
vs GPT-5.4-nano
9-15-0
24 raw games
BT edge: 0.375 (0.212 to 0.573)
vs GPT-5-mini
2-10-0
12 raw games
BT edge: 0.167 (0.047 to 0.448)
1-11-0
12 raw games
BT edge: 0.083 (0.015 to 0.354)
0-12-0
12 raw games
BT edge: 0.000 (0.000 to 0.242)
5-5-0
10 raw games
BT edge: 0.500 (0.237 to 0.763)
vs GPT-5.4-mini
1-9-0
10 raw games
BT edge: 0.100 (0.018 to 0.404)
0-2-0
2 raw games
BT edge: 0.000 (0.000 to 0.658)