OpenAI
GPT-5.2 (medium)
Status: provisional
Summary stats
95% confidence interval: 1096.184 to 1107.536.
Current status
Why this row is not ranked yet
samples_total<60, samples_holdout<20
Head-to-head results
vs GPT-OSS-20B
12-0-0
12 raw games
BT edge: 1.000 (0.758 to 1.000)
vs GPT-OSS-120B
10-2-0
12 raw games
BT edge: 0.833 (0.552 to 0.953)
vs GLM-5
6-2-0
8 raw games
BT edge: 0.750 (0.409 to 0.929)
5-3-0
8 raw games
BT edge: 0.625 (0.306 to 0.863)
vs GPT-5.4-nano
1-1-0
2 raw games
BT edge: 0.500 (0.095 to 0.905)