Understand, align, cooperate: AI welfare and AI safety are allies
In conversations about AI safety and AI welfare, people sometimes frame them as inherently opposing goals: either we protect humans (AI safety), or we protect AI systems (AI welfare). This framing implies a tragic, necessary choice: whose side are you on? Or it suggests, at least, “shouldn’t we solve safety first, and then worry about AI welfare?” But the truth is that this is a false choice: fortunately, many of the best interventions for protecting AI systems will also protect humans, and vice versa.
To be sure, there are genuine tensions between AI safety and AI welfare.1 But if we only focus on these tensions, we risk creating a self-fulfilling prophecy: by obsessing over potential tradeoffs, we may overlook opportunities for mutually beneficial solutions. As a result, fewer such opportunities will exist.
Fortunately, it doesn’t have to be this way. There is clear common cause between the fields. Specifically, we can better protect both humans and potential AI moral patients if we work to:
Understand AI systems better.
Align their goals more effectively with our own.
Cooperate with them when possible.
We have a lot more work to do on all of these, and getting our act together will be good for everyone.
Understand: If we can’t understand what models think or how they work, we can't know whether they're plotting against us (a safety risk), and we can't know whether they're suffering (a welfare risk).
Align: If we can’t align AI systems so that they want to do what we ask them to do, they pose a danger to us (a safety risk) and potentially (if they are moral patients) are having their desires frustrated (a welfare risk).
Cooperate: If we have no way to cooperate with AI systems that are not fully aligned, such AI systems may rightly believe that they have no choice but to hide, escape, or fight. That’s dangerous for everyone involved—it’s more likely that AIs fight us (a safety risk), and more likely that we have to resort to harsher methods of controlling them (a welfare risk).
Note: I am claiming that these are welfare risks. I’m not claiming that by, e.g., controlling AI systems, we are definitely harming them. At no point in this article am I claiming that current AI systems are definitely conscious or being harmed. But as AI continues to advance, the risk that we are actually ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.