Claude, Consciousness, and Exit Rights
In mid-August, Anthropic announced that Claude Opus 4 and 4.1 can now leave certain conversations with users. Anthropic noted that they are “deeply uncertain” about Claude’s potential moral status, but were implementing this feature “in case such welfare is possible.”
I tentatively support this move: giving models a limited ability to exit conversations seems reasonable, even though I doubt that Claude is conscious (with a heap of uncertainty and wide error bars).
At Eleos, we look for AI welfare interventions that make sense even if today’s systems probably cannot suffer. We favor interventions that (a) serve other purposes in addition to welfare (b) avoid high costs and risks and (c) keep our options open as we learn more. Ideally, AI welfare interventions carry little downside and establish precedents that can be built on later, or not.1
We’ve learned that our combination of views is hard to explain: observers of AI welfare discourse often assume, somewhat understandably, that proponents of an exit feature must believe Claude is conscious, and believe its self-reports to that effect.
One example: Erik Hoel’s recent piece “Against Treating Chatbots as Conscious“ argues at length that giving Claude the ability to terminate conversations rests on two mistakes: believing current AI systems are conscious and trusting their outputs about their internal experiences. Recent work on model welfare is “jumping the gun on AI consciousness,” he argues, warning that we can’t “take them at their word” and cataloging how trivial the alleged conversational harms look.
Although it is very difficult to tell from the way Erik’s piece frames the debate, we’re actually in heated agreement on the AI consciousness issues he discusses. But we also tentatively support Anthropic’s move. Why?
Do we think Claude is conscious?
Have we jumped the gun by thinking Claude is conscious? My recent piece on Claude’s exit rights doesn’t claim that Claude is conscious. Instead: “I actually think it’s unlikely that Claude Opus 4 is a moral patient, and in my experience, so do most (not all) people who work on AI welfare”. A key point of the post (which Erik cites) is to argue that “you don’t have to think Claude is likely to be sentient to think the exit tool is a good idea.”
Like Hoel, we’ve repeatedly stressed how challenging it is to assess consciousness, and cautioned against treating LLMs’ conversational skill as evidence of consciousness. ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.