Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

177K views

Statistical Machine Learning

1 month ago

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation