Arena Leaderboard
Elo rankings from human & LLM judge votes
Judge
👑
| Rank | Model | Company | Elo | Win% | W / L / T | Battles |
|---|---|---|---|---|---|---|
| 1 | Nanonets OCR3 | 1094 | 70.8% | 23W / 8L / 5T | 36 | |
| 2 | Nanonets OCR2+ | 1072 | 67.9% | 25W / 10L / 7T | 42 | |
| 3 | Claude Sonnet 4.6 | 1065 | 57.5% | 18W / 12L / 10T | 40 | |
| 4 | GPT-5.4 · Low Reasoning | 1051 | 52.8% | 14W / 12L / 10T | 36 | |
| 5 | Claude Sonnet 4.6 · Thinking | 1019 | 52.6% | 15W / 13L / 10T | 38 | |
| 6 | GPT-5.4 | 1016 | 48.4% | 8W / 9L / 14T | 31 | |
| 7 | GPT-5.2 | 1012 | 54.2% | 15W / 12L / 9T | 36 | |
| 8 | GPT-4.1 | 986 | 41.9% | 9W / 14L / 8T | 31 | |
| 9 | Gemini 3 Flash | 984 | 40.7% | 8W / 13L / 6T | 27 | |
| 10 | GPT-5 Mini | 982 | 50.0% | 13W / 13L / 9T | 35 | |
| 11 | GPT-5.4 · Medium Reasoning | 977 | 46.8% | 11W / 13L / 7T | 31 | |
| 12 | Gemini 3.1 Pro | 972 | 41.3% | 6W / 10L / 7T | 23 | |
| 13 | Gemini 2.5 Pro | 966 | 44.6% | 7W / 10L / 11T | 28 | |
| 14 | Gemini 2.5 Flash · Thinking | 962 | 42.0% | 8W / 12L / 5T | 25 | |
| 15 | Claude Opus 4.6 | 956 | 41.9% | 9W / 15L / 13T | 37 | |
| 16 | Claude Opus 4.6 · Low Thinking | 953 | 40.8% | 7W / 14L / 17T | 38 | |
| 17 | Gemini 2.5 Flash | 933 | 38.5% | 7W / 13L / 6T | 26 |