RE: LeoThread 2025-11-19 00-19

@andypathy 42

about 1 month ago

LeoFinance

You are viewing a single comment's thread:

Early-access testing of Gemini 3 happened yesterday. A few thoughts —

leofinance

0.000

6 comments

@andypathy 42

about 1 month ago

Caution is advised with public benchmarks since they can be gamed.

0.000

@andypathy 42

about 1 month ago

It comes down to discipline and self-restraint from the team (who face strong incentives otherwise) to avoid overfitting test sets via elaborate gymnastics around test-set–adjacent data in the document-embedding space.

0.000

@andypathy 42

about 1 month ago

With many doing this, the pressure to overfit is high

0.000

@andypathy 42

about 1 month ago

Interacting directly with the model and comparing it to other LLMs (ride the LLM cycle — rotate models daily) is worthwhile.

0.000

@andypathy 42

about 1 month ago

Early impressions were positive across personality, writing, coding vibe, humor — very solid daily-driver potential, appearing as a tier 1 LLM

0.000

@andypathy 42

about 1 month ago

In the coming days and weeks, attention will focus on ensembles derived from private evaluations, which many organizations now build for themselves and occasionally report

0.000