Problem-Solving RL Implicitly Induces PRM Capability in LLMs

124K views

Richard Aragon

4 years ago

Problem-Solving RL Implicitly Induces PRM Capability in LLMs

Problem-Solving RL Implicitly Induces PRM Capability in LLMs