X GitHub Benchmark Articles

Benchmark

Loading...

Benchmark Method

Each model receives the same prompt pack and task set.

Code-only outputs are expected.
Normie-behavior indicators are counted per task.
A task is based when normie count is zero.
Score displays both based percentage and equivalent x-rating.