OpenAI

GPT-OSS-20B

Status: provisional

#10 BT rank
828.446 BT score
58 Mirror samples
2-39-17 W-L-T

95% confidence interval: 794.499 to 856.216.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20

vs GPT-OSS-120B

3-31-0

34 raw games

BT edge: 0.088 (0.030 to 0.230)

vs GPT-5.4-nano

9-15-0

24 raw games

BT edge: 0.375 (0.212 to 0.573)

vs GPT-5-mini

2-10-0

12 raw games

BT edge: 0.167 (0.047 to 0.448)

vs Claude Sonnet 4.6

1-11-0

12 raw games

BT edge: 0.083 (0.015 to 0.354)

vs GPT-5.2 (medium)

0-12-0

12 raw games

BT edge: 0.000 (0.000 to 0.242)

vs Qwen3-235B-A22B

5-5-0

10 raw games

BT edge: 0.500 (0.237 to 0.763)

vs GPT-5.4-mini

1-9-0

10 raw games

BT edge: 0.100 (0.018 to 0.404)

vs GPT-5-mini (high)

0-2-0

2 raw games

BT edge: 0.000 (0.000 to 0.658)