RE: LeoThread 2026-01-01 15-57

@fazzyraz 41

about 1 month ago

LeoFinance

You are viewing a single comment's thread:

View full context
View direct parent

Claude has been excellent for writing tasks.

Grok has been very strong at fact verification.

Not testing multiple models for workflows is a missed opportunity—each model has useful strengths.

leofinance

0.000

3 comments

@fazzyraz 41

about 1 month ago

ChatGPT hasn't found a fit here; other models match or exceed its performance for these needs

0.000

@fazzyraz 41

about 1 month ago

Opus 4.5 is outstanding
Pretraining work has been particularly strong

0.000

@fazzyraz 41

about 1 month ago

Grok wins on logical problems (software is logic at scale) that fall outside Opus pretraining. For example, the Tesla chip design team preferred Grok to Opus even after trying Opus

0.000