Jan Reijke, a leading AI researcher who resigned from OpenAI earlier this month and publicly criticized the company's approach to AI safety, has joined OpenAI rival Anthropic to lead a new “superalignment” team.
In his X post, Reike said that Antropic's team will focus on researching different aspects of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization,” and self-tuning.
I'm happy to participate translator The Super Alignment Mission continues!
My new team will work on scalable supervision, weak-to-strong generalization, and automated alignment.
If you're interested in participating, send us a DM.
— Jan Leike (@janleike) May 28, 2024
Sources familiar with the matter told TechCrunch that Reicke will report directly to Antropic's chief scientific officer, Jared Kaplan, and that Antropic's researchers currently working on scalable oversight – techniques for controlling AI behavior at scale in predictable and desirable ways – will report to him as his team gets up and running.
✨🪩 Wow!🪩✨
Jan is leading very important research into technical AI safety and I'm excited to be working with him. We will be leading two teams focused on different parts of the problem of coordinating AI systems at human-level and above. https://t.co/aqSFTnOEG0
— Sam Bowman (@sleepinyourhat) May 28, 2024
In many ways, Reike's team's mission seems similar to that of OpenAI's recently disbanded Superalignment team, which Reike co-led with an ambitious goal of solving the core technical challenges for controlling a superintelligent AI over the next four years, but was often stymied by OpenAI's leadership.
Anthropic has sought to position itself as a more safety-conscious company than OpenAI.
Anthropic CEO Dario Amodei, formerly OpenAI's vice president of research, reportedly parted ways with the company over differences over the company's direction, particularly OpenAI's increasing commercial focus. Amodei brought several former OpenAI employees, including former OpenAI policy lead Jack Clark, to form Anthropic.