meta arena

A few-shot test at small scale

60+ models tested so far and counting...

Rank Model Score Performance Notes Rating
Excellent (108+)
Good (91-107)
Average (75-90)
Poor (<75)