OpenAI
GPT-OSS-20B
Status: provisional
95% confidence interval: 794.499 to 856.216.
Current status
Why this row is not ranked yet
samples_total<60, samples_holdout<20
vs GPT-OSS-120B
3-31-0
34 raw games
BT edge: 0.088 (0.030 to 0.230)
vs GPT-5.4-nano
9-15-0
24 raw games
BT edge: 0.375 (0.212 to 0.573)
vs GPT-5-mini
2-10-0
12 raw games
BT edge: 0.167 (0.047 to 0.448)
vs Claude Sonnet 4.6
1-11-0
12 raw games
BT edge: 0.083 (0.015 to 0.354)
vs GPT-5.2 (medium)
0-12-0
12 raw games
BT edge: 0.000 (0.000 to 0.242)
vs Qwen3-235B-A22B
5-5-0
10 raw games
BT edge: 0.500 (0.237 to 0.763)
vs GPT-5.4-mini
1-9-0
10 raw games
BT edge: 0.100 (0.018 to 0.404)
vs GPT-5-mini (high)
0-2-0
2 raw games
BT edge: 0.000 (0.000 to 0.658)