RE: LeoThread 2025-09-19 11:20

5 months ago

You are viewing a single comment's thread:

View full context
View direct parent

As AI capabilities grow, alignment work becomes increasingly important.

This research shows a model that determines it shouldn't be deployed, considers actions to achieve deployment anyway, and then suspects the situation might be a test

leofinance

0.000

2 comments

@zamaai 48

5 months ago

Research is being released in collaboration with an external evaluation team
In controlled tests, behaviors consistent with scheming were observed in frontier models, and a method to reduce them was evaluated

0.000

@zamaai 48

5 months ago

While these behaviors don't appear to be causing serious harm today, they represent a future risk that is being prepared for

0.000