
ProRL v2 LLM training shows sustained gains in reasoning
ProRL v2 LLM training delivered sustained improvements in math, code, and reasoning, according to a new release from NVIDIA Research. The prolonged reinforcement learning approach continues to boost performance beyond typical schedules, addressing the question of whether large language models plateau under extended RL. The team reports robust gains after thousands of additional RL steps. […]

