Deep Learning Weekly: Issue 423

By Various · Deep Learning Weekly ·Sep 24, 2025 · 5 min read

This week in deep learning, we bring you DeepSeek-V3.1-Terminus, Introduction to LLM-as-a-Judge For Evals, and a paper on LIMI: Less is More for Agency.

You may also enjoy Luma AI launches Ray3, The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior, a paper on Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

DeepSeek-V3.1-Terminus

The DeepSeek team introduced DeepSeek-V3.1-Terminus, an upgraded version of its V3.1, designed to improve language consistency and agentic tool effectiveness.

Luma AI launches Ray3

Luma AI announced the launch of Ray3, a powerful text-to-video AI model with built-in reasoning, designed for high-quality cinematic visual production for professionals.

Strengthening our Frontier Safety Framework

The DeepMind team published the third iteration of their Frontier Safety Framework (FSF) — their most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models.

Former NotebookLM devs’ new app, Huxe, taps audio to help you with news and research

Former NotebookLM devs are now building an audio-first app called Huxe, which can similarly help users dive deep into topics by generating a “podcast” with multiple AI hosts.

Gemini comes to Google TV

Google introduced Gemini for TV for engaging in free-flowing conversations with your big screen.

MLOps & LLMOps

Introduction to LLM-as-a-Judge For Evals

A guide on how to use one LLM to evaluate and score the outputs of another, the pros and cons of this approach, and the steps to getting started using LLM-as-a-Judge.

A postmortem of three recent issues \ Anthropic

A comprehensive postmortem about three complex, overlapping infrastructure bugs that intermittently degraded Claude’s response quality.

Rapid ML experimentation for enterprises with Amazon SageMaker AI and Comet

An article demonstrating a fraud detection workflow using Amazon SageMaker AI and Comet to provide enterprises with robust experiment management, reproducibility, and audit-ready logging.

Adding Document Understanding to Claude Code

A blog post detailing three methods, including using MCP and enhanced CLI commands, to equip Claude Code with document understanding capabilities for enterprise applications.

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

A technical blog post introducing speculative decoding, an inference optimization technique that significantly reduces LLM latency by using a lightweight draft mechanism.

Learning

Deep ...

Read full article on Deep Learning Weekly →

This excerpt is provided for preview purposes. Full article content is available on the original publication.