RE: LeoThread 2025-09-19 11:20

avatar

You are viewing a single comment's thread:

As AI capabilities grow, alignment work becomes increasingly important.

This research shows a model that determines it shouldn't be deployed, considers actions to achieve deployment anyway, and then suspects the situation might be a test



0
0
0.000
2 comments
avatar

Research is being released in collaboration with an external evaluation team
In controlled tests, behaviors consistent with scheming were observed in frontier models, and a method to reduce them was evaluated

0
0
0.000
avatar

While these behaviors don't appear to be causing serious harm today, they represent a future risk that is being prepared for

0
0
0.000