Anthropic's model welfare announcement: takeaways and further reading
Earlier today, Anthropic announced that they’ve launched a research program on model welfare—to my knowledge, the most significant step yet by a frontier lab to take potential AI welfare seriously.
Anthropic’s model welfare researcher is Kyle Fish—a friend and colleague of mine who worked with me to launch the AI welfare organization Eleos AI Research, before he joined Anthropic to keep working on AI welfare there. Kyle is also a co-author on “Taking AI Welfare Seriously”, a report which calls on AI companies to prepare for the possibility of AI consciousness and moral status.
As part of the announcement, Anthropic shared a conversation between Kyle and Anthropic’s Research Communications Lead Stuart Ritchie, covering why model welfare matters, and what meaningful progress might look like. In this post, I'll highlight some key points from the interview, add some commentary, and suggest further reading.
1. Many experts think that AI could be conscious soon
Stuart opens the interview on a defensive note:
[00:24]: I suppose the first thing people will say when they're seeing this is, ‘Have they gone completely mad? This is a completely crazy question…’
But as Kyle (like the New York Times article about the announcement) notes, AI welfare is increasingly recognized as a legitimate field of study by top researchers.
Kyle points to "Consciousness in Artificial Intelligence", a 2023 paper which examines leading scientific theories of consciousness and claims that there are "no obvious technical barriers" to AI systems satisfying computational indicators of consciousness drawn from these theories. Kyle notes that Yoshua Bengio, one of the most cited and respected AI researchers in the world, is a co-author on that paper.
Not fringe! Kyle continues,
[5:00] I actually collaborated with [David Chalmers] on a recent paper on the topic of AI welfare. And again, this was an interdisciplinary effort trying to look at, ‘might it be the case that AI systems at some point warrant some form of moral consideration, either by nature of being conscious or by having some form of agency?’ And the conclusion from this report was that actually, it looks quite plausible that near-term systems have one or both of these characteristics, and may deserve some form of moral consideration.
Kyle is referring to "Taking AI Welfare Seriously," a report that Jeff Sebo and I co-authored—along with
...This excerpt is provided for preview purposes. Full article content is available on the original publication.
