← Back to Library

Deep Learning Weekly: Issue 430

Deep Dives

Explore related topics with these Wikipedia articles, rewritten for enjoyable reading:

  • Reinforcement learning from human feedback 13 min read

    The SPICE paper introduces a self-play reinforcement learning framework for improving reasoning, which builds upon and contrasts with RLHF techniques. Understanding RLHF provides essential context for appreciating how self-play methods like SPICE represent an evolution in training paradigms.

  • Socratic method 13 min read

    The paper on 'Synthetic Socratic Debates' uses AI agents engaging in structured dialectical exchanges over moral dilemmas. Understanding the classical Socratic method of inquiry through questioning illuminates why researchers chose this framework for studying AI moral reasoning and persuasion.

This week in deep learning, we bring you Kimi K2 Thinking, Nested Learning: A new ML paradigm for continual learning, and a paper on SPICE: Self-Play In Corpus Environments Improves Reasoning.

You may also enjoy Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages, TabPFN-2.5 Model Report, a paper on Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


Industry

Kimi K2 Thinking

The Moonshot team introduced Kimi K2 Thinking, an open-source thinking model that sets new records across benchmarks that assess reasoning, coding, and agent capabilities.

Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages

Meta introduced Omnilingual Automatic Speech Recognition (ASR), a suite of models providing automatic speech recognition capabilities for more than 1,600 languages.

TabPFN-2.5 Model Report

Prior Labs releases TabPFN-2.5, a tabular foundation model that matches complex AutoGluon ensembles while scaling to 50,000 samples and 2,000 features.

Anthropic and Iceland announce one of the world’s first national AI education pilots

Anthropic and Iceland’s Ministry of Education and Children announced a partnership to bring Claude to teachers across the nation, launching one of the world’s first comprehensive national AI education pilots.

AI-powered visual presentation platform Gamma raises $68M at $2.1B valuation

Gamma announced that it has raised $68 million, led by Andreessen Horowitz, at a valuation of $2.1 billion.

MLOps & LLMOps.

Human-in-the-Loop Review Workflows for LLM Applications & Agents

A blog post explaining Human-in-the-Loop review workflows, including systematic tracing and structured rubric design.

Building powerful RAG pipelines with Docling and OpenSearch

A technical blog post detailing how to build RAG pipelines by integrating the Docling document processing toolkit with OpenSearch for high-performance, metadata-aware vector retrieval.

Where to use sub-agents versus agents as tools

A blog post explaining the key difference between sub-agents and agents as tools in multi-agent systems.

Learning

Best LLM Observability Tools of 2025: Top Platforms & Features

Learn about the top LLM observability tools of 2025, including Opik, Langfuse, and Datadog, to monitor, evaluate, and optimize model performance.

Nested Learning: A new ML paradigm for continual learning

A foundational research blog introducing the Nested Learning paradigm, which unifies model architecture and optimization as interconnected problems to create continuum memory systems.

5 Thoughts on Kimi ...

Read full article on Deep Learning Weekly →