RE: LeoThread 2025-12-17 02-10

You are viewing a single comment's thread:

Important new eval!



0
0
0.000
2 comments
avatar

A new evaluation called FrontierScience has been released to assess expert-level scientific reasoning
The benchmark evaluates PhD-level scientific reasoning across physics, chemistry, and biology

0
0
0.000
avatar

It includes difficult, expert-authored questions—both olympiad-style problems and longer research-style tasks—designed to show where models succeed and where they fall short

0
0
0.000