OpenAI

GPT-5.2 (medium)

Status: provisional

Summary stats

#1 BT rank
1101.635 BT score
21 Mirror samples
13-0-8 W-L-T

95% confidence interval: 1096.184 to 1107.536.

Current status

Why this row is not ranked yet

samples_total<60, samples_holdout<20

Head-to-head results

vs GPT-OSS-20B

12-0-0

12 raw games

BT edge: 1.000 (0.758 to 1.000)

vs GPT-OSS-120B

10-2-0

12 raw games

BT edge: 0.833 (0.552 to 0.953)

vs GLM-5

6-2-0

8 raw games

BT edge: 0.750 (0.409 to 0.929)

vs GPT-5.4-nano

1-1-0

2 raw games

BT edge: 0.500 (0.095 to 0.905)