A safe harbor for AI evaluation and red teaming
This blog post is authored by Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Arvind Narayanan, Percy Liang, and Peter Henderson. The paper has 23 authors and is available here.
Today, we are releasing an open letter encouraging AI companies to provide legal and technical protections for good-faith research on their AI models. The letter focuses on the importance of independent evaluations of proprietary generative AI models, particularly those with millions of users. In an accompanying paper, we discuss existing challenges to independent research and how a more equitable, transparent, and accountable researcher ecosystem could be developed.
The letter has been signed by hundreds of researchers, practitioners, and advocates across disciplines, and is open for signatures.
Read and sign the open letter here. Read the paper here.
Independent evaluation of AI is crucial for uncovering vulnerabilities
AI companies, academic researchers, and civil society agree that generative AI models pose acute risks: independent risk assessment is an essential mechanism for providing accountability. Nevertheless, barriers exist that inhibit the independent evaluation of many AI models.
Independent researchers often evaluate and “red team” AI models to measure a variety of different risks. In this work, we focus on post-release evaluation of models (or APIs) by external researchers beyond the model developer. This is also referred to as algorithmic audits by third parties. Some companies also conduct red teaming before their models are released both internally and with experts they select.
While many types of testing are critical, independent evaluation of AI models that are already deployed is widely regarded as essential for ensuring safety, security, and trust. Independent red-teaming research of AI models has uncovered vulnerabilities related to low resource languages, bypassing safety measure, and a wide range of jailbreaks. These evaluations investigate a broad set of often unanticipated model flaws, related to misuse, bias, copyright, and other issues.
Terms of service can discourage community-led evaluations
Despite the need for independent evaluation, conducting research related to these vulnerabilities is often legally prohibited by the terms of service for popular AI models, including those of OpenAI, Google, Anthropic, Inflection, Meta, and Midjourney.
While these terms are intended as a deterrent against malicious actors, they also inadvertently restrict AI safety and trustworthiness research—companies forbid the research and may enforce their policies with account suspensions (as an example, see Anthropic’s acceptable use ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.