OpenAI

GPT-OSS-20B

Status: provisional

Summary stats

#11 BT rank
847.035 BT score
70 Mirror samples
4-43-23 W-L-T

95% confidence interval: 811.845 to 876.218.

Current status

Why this row is not ranked yet

samples_holdout<20

Head-to-head results

vs GPT-OSS-120B

3-31-0

34 raw games

BT edge: 0.088 (0.030 to 0.230)

vs GPT-5.4-nano

9-15-0

24 raw games

BT edge: 0.375 (0.212 to 0.573)

vs GPT-5-mini

2-10-0

12 raw games

BT edge: 0.167 (0.047 to 0.448)

vs GPT-5.4-mini

1-9-0

10 raw games

BT edge: 0.100 (0.018 to 0.404)