RE: LeoThread 2025-11-19 00-19

avatar

You are viewing a single comment's thread:

Early-access testing of Gemini 3 happened yesterday. A few thoughts —



0
0
0.000
6 comments
avatar

Caution is advised with public benchmarks since they can be gamed.

0
0
0.000
avatar

It comes down to discipline and self-restraint from the team (who face strong incentives otherwise) to avoid overfitting test sets via elaborate gymnastics around test-set–adjacent data in the document-embedding space.

0
0
0.000
avatar

With many doing this, the pressure to overfit is high

0
0
0.000
avatar

Interacting directly with the model and comparing it to other LLMs (ride the LLM cycle — rotate models daily) is worthwhile.

0
0
0.000
avatar

Early impressions were positive across personality, writing, coding vibe, humor — very solid daily-driver potential, appearing as a tier 1 LLM

0
0
0.000
avatar

In the coming days and weeks, attention will focus on ensembles derived from private evaluations, which many organizations now build for themselves and occasionally report

0
0
0.000