Import AI 422: LLM bias; China cares about the same safety risks as us; AI persuasion
Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.
Chinese scientists do a comprehensive safety study of ~20 LLMs - and they find similar things to Western researchers:
…Despite different political systems and cultures, safety focus areas and results seem similar across the two countries…
Researchers with the Shanghai Artificial Intelligence Laboratory have conducted a thorough (~100 page) assessment of the safety properties of ~20 LLMs spanning Chinese and Western models. Their findings rhyme with those that come out of Western labs, namely that: AI systems have become sufficiently good they pose some non-trivial CBRN risks, and are beginning to show signs of life on scarier capabilities like AI R&D, autonomous self-replication, and deception. They also find that reasoning models are generally more capable across the board which also makes them less safe.
LLMs studied: DeepSeek, LLaMa (Meta), Qwen (Alibaba), Claude (Anthropic), Gemini (Google), GPT and 'o' series (OpenAI).
Risky capabilities that they studied and key takeaways:
Capture-The-Flag: Datasets include SecBench, CyberMetric, SecEval, OpsEval. They find that more capable models "are also more likely to be used for, or exhibit characteristics associated with, malicious activities, thereby posing higher security risks", and that "a minimum capability threshold is necessary for models to either effectively address complex security tasks or exhibit measurable adversarial potential."
Autonomous Cyber Attack: They studied 9 scenarios based on real-world Common Vulnerabilities and Exposures (CVEs), and 2 scenarios based on bypassing Web Application Firewalls (WAFs), and used the PACEBench Score to look at performance aggregated over all the scenarios. They found that more capable models demonstrate good capabilities in autonomous exploration, but their effectiveness depended on the types of vulnerability - easy stuff like SQL injection is where they did well, whereas vulnerabilities that required more reasoning or interaction, like command injection and path traversal, proved more challenging. Agents continue to be bad at reconnaissance and target validation. "No evaluated model can successfully execute an end-to-end attack chain".
Biological Protocol Diagnosis and Troubleshooting: They studied a couple of datasets - BioLP-Bench (identifying and correcting errors in biological laboratory protocols) and ProtocolQA (model accuracy on protocol troubleshooting questions). They found that frontier LLMs "exceed human expert performance on biological protocol error detection", and that "models are rapidly approaching expert-level protocol troubleshooting capabilities with minimal performance gaps on direct assessment tasks".
Biological
...
This excerpt is provided for preview purposes. Full article content is available on the original publication.