Deep Learning Weekly: Issue 430
Deep Dives
Explore related topics with these Wikipedia articles, rewritten for enjoyable reading:
-
Reinforcement learning from human feedback
13 min read
The SPICE paper introduces a self-play reinforcement learning framework for improving reasoning, which builds upon and contrasts with RLHF techniques. Understanding RLHF provides essential context for appreciating how self-play methods like SPICE represent an evolution in training paradigms.
-
Socratic method
13 min read
The paper on 'Synthetic Socratic Debates' uses AI agents engaging in structured dialectical exchanges over moral dilemmas. Understanding the classical Socratic method of inquiry through questioning illuminates why researchers chose this framework for studying AI moral reasoning and persuasion.
This week in deep learning, we bring you Kimi K2 Thinking, Nested Learning: A new ML paradigm for continual learning, and a paper on SPICE: Self-Play In Corpus Environments Improves Reasoning.
You may also enjoy Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages, TabPFN-2.5 Model Report, a paper on Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
The Moonshot team introduced Kimi K2 Thinking, an open-source thinking model that sets new records across benchmarks that assess reasoning, coding, and agent capabilities.
Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages
Meta introduced Omnilingual Automatic Speech Recognition (ASR), a suite of models providing automatic speech recognition capabilities for more than 1,600 languages.
Prior Labs releases TabPFN-2.5, a tabular foundation model that matches complex AutoGluon ensembles while scaling to 50,000 samples and 2,000 features.
Anthropic and Iceland announce one of the world’s first national AI education pilots
Anthropic and Iceland’s Ministry of Education and Children announced a partnership to bring Claude to teachers across the nation, launching one of the world’s first comprehensive national AI education pilots.
AI-powered visual presentation platform Gamma raises $68M at $2.1B valuation
Gamma announced that it has raised $68 million, led by Andreessen Horowitz, at a valuation of $2.1 billion.
MLOps & LLMOps.
Human-in-the-Loop Review Workflows for LLM Applications & Agents
A blog post explaining Human-in-the-Loop review workflows, including systematic tracing and structured rubric design.
Building powerful RAG pipelines with Docling and OpenSearch
A technical blog post detailing how to build RAG pipelines by integrating the Docling document processing toolkit with OpenSearch for high-performance, metadata-aware vector retrieval.
Where to use sub-agents versus agents as tools
A blog post explaining the key difference between sub-agents and agents as tools in multi-agent systems.
Learning
Best LLM Observability Tools of 2025: Top Platforms & Features
Learn about the top LLM observability tools of 2025, including Opik, Langfuse, and Datadog, to monitor, evaluate, and optimize model performance.
Nested Learning: A new ML paradigm for continual learning
A foundational research blog introducing the Nested Learning paradigm, which unifies model architecture and optimization as interconnected problems to create continuum memory systems.
This excerpt is provided for preview purposes. Full article content is available on the original publication.