RE: LeoThread 2025-11-19 00-19
You are viewing a single comment's thread:
Early-access testing of Gemini 3 happened yesterday. A few thoughts —
0
0
0.000
You are viewing a single comment's thread:
Early-access testing of Gemini 3 happened yesterday. A few thoughts —
Caution is advised with public benchmarks since they can be gamed.
It comes down to discipline and self-restraint from the team (who face strong incentives otherwise) to avoid overfitting test sets via elaborate gymnastics around test-set–adjacent data in the document-embedding space.
With many doing this, the pressure to overfit is high
Interacting directly with the model and comparing it to other LLMs (ride the LLM cycle — rotate models daily) is worthwhile.
Early impressions were positive across personality, writing, coding vibe, humor — very solid daily-driver potential, appearing as a tier 1 LLM
In the coming days and weeks, attention will focus on ensembles derived from private evaluations, which many organizations now build for themselves and occasionally report