Deep Learning Weekly: Issue 429

By Various · Deep Learning Weekly ·Nov 5, 2025 · 6 min read

Deep Dives

Explore related topics with these Wikipedia articles, rewritten for enjoyable reading:

Mixture of experts 12 min read
The article discusses scaling large MoE (Mixture-of-Experts) models and Kimi Linear's hybrid architecture. Understanding the foundational concept of mixture of experts - how multiple specialized neural networks are combined with a gating mechanism - provides essential context for grasping why this architectural approach enables efficient scaling and why expert parallelism matters.
Recurrent neural network 14 min read
The Kimi Linear paper describes using 'finite-state RNN memory' and the classical delta rule in its attention architecture. Readers would benefit from understanding RNN fundamentals to appreciate how linear attention mechanisms draw from and improve upon recurrent approaches while maintaining computational efficiency.
Reinforcement learning from human feedback 13 min read
The article mentions Emu3.5 being 'post-trained with large-scale reinforcement learning' and discusses RL scaling regimes for Kimi Linear. RLHF is the dominant technique for aligning language models, and understanding this specific methodology provides crucial context for how modern models are trained beyond initial pretraining.

This week in deep learning, we bring you Introducing Aardvark: OpenAI’s agentic security researcher, Stress-testing model specs reveals character differences among language models, and a paper on Kimi Linear: An Expressive, Efficient Attention Architecture.

You may also enjoy State of AI Ethics Report Volume 7, Beyond Standard LLMs, a paper on LinEAS: End-to-end Learning of Activation Steering with a Distributional Loss, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Introducing Aardvark: OpenAI’s agentic security researcher

OpenAI announced Aardvark, an agentic security researcher powered by GPT‑5.

State of AI Ethics Report (SAIER) Volume 7

A special edition report overview discussing the State of AI Ethics in 2025, which analyzes geopolitical conflicts, societal impacts, and community-centered solutions.

Real-Time Text-to-SQL Behind Snowflake Intelligence

Snowflake introduced Arctic-Text2SQL-R1.5, a model purpose-built for Snowflake SQL that delivers superior accuracy and up to 3x lower latency compared to general LLMs for real-time analytics.

In a First, AI Models Analyze Language As Well As a Human Expert

Research shows that OpenAI’s o1 exhibited metalinguistic abilities by successfully analyzing complex recursion and inferring rules of newly invented phonological systems.

OpenAI inks $38B AI infrastructure deal with AWS

OpenAI will rent $38 billion worth of cloud infrastructure from AWS as part of a seven-year partnership.

MLOps & LLMOps.

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

A technical blog post explaining how NVIDIA TensorRT-LLM’s Wide Expert Parallelism efficiently scales large Mixture-of-Experts models on GB200 NVL72 systems, achieving significant performance and cost improvements.

Streamlining clinical trial software configurations using Amazon Bedrock

A blog post about how Clario automates and streamlines complex clinical trial software configurations using Claude and Amazon Bedrock.

Build your first AI Agent with Gemini, n8n and Google Cloud Run

A tutorial detailing the steps required to deploy the open-source n8n workflow automation tool on Google Cloud Run and configure a basic AI Agent.

Learning

Stress-testing model specs reveals character differences among language models

A blog post detailing a methodology that stress-tested LLM specifications across 300,000 scenarios, uncovering hidden contradictions and distinct behavioral patterns among frontier models.

How a 7-Million Parameter Model Beats GPT, Gemini, and Claude at Reasoning

A comprehensive breakdown of the Tiny Recursive Model (TRM), a 7-million parameter architecture that achieves superior ...

Read full article on Deep Learning Weekly →

This excerpt is provided for preview purposes. Full article content is available on the original publication.