Deep Learning Weekly: Issue 423
This week in deep learning, we bring you DeepSeek-V3.1-Terminus, Introduction to LLM-as-a-Judge For Evals, and a paper on LIMI: Less is More for Agency.
You may also enjoy Luma AI launches Ray3, The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior, a paper on Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
The DeepSeek team introduced DeepSeek-V3.1-Terminus, an upgraded version of its V3.1, designed to improve language consistency and agentic tool effectiveness.
Luma AI announced the launch of Ray3, a powerful text-to-video AI model with built-in reasoning, designed for high-quality cinematic visual production for professionals.
Strengthening our Frontier Safety Framework
The DeepMind team published the third iteration of their Frontier Safety Framework (FSF) — their most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models.
Former NotebookLM devs’ new app, Huxe, taps audio to help you with news and research
Former NotebookLM devs are now building an audio-first app called Huxe, which can similarly help users dive deep into topics by generating a “podcast” with multiple AI hosts.
Google introduced Gemini for TV for engaging in free-flowing conversations with your big screen.
MLOps & LLMOps
Introduction to LLM-as-a-Judge For Evals
A guide on how to use one LLM to evaluate and score the outputs of another, the pros and cons of this approach, and the steps to getting started using LLM-as-a-Judge.
A postmortem of three recent issues \ Anthropic
A comprehensive postmortem about three complex, overlapping infrastructure bugs that intermittently degraded Claude’s response quality.
Rapid ML experimentation for enterprises with Amazon SageMaker AI and Comet
An article demonstrating a fraud detection workflow using Amazon SageMaker AI and Comet to provide enterprises with robust experiment management, reproducibility, and audit-ready logging.
Adding Document Understanding to Claude Code
A blog post detailing three methods, including using MCP and enhanced CLI commands, to equip Claude Code with document understanding capabilities for enterprise applications.
An Introduction to Speculative Decoding for Reducing Latency in AI Inference
A technical blog post introducing speculative decoding, an inference optimization technique that significantly reduces LLM latency by using a lightweight draft mechanism.
Learning
Deep ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.