Deep Learning Weekly: Issue 431
Deep Dives
Explore related topics with these Wikipedia articles, rewritten for enjoyable reading:
-
Reinforcement learning from human feedback
13 min read
MiroThinker uses reinforcement learning for interaction scaling with environment feedback - RLHF is the foundational technique that enabled modern AI agents to learn from interactions, directly relevant to understanding how these research agents improve
This week in deep learning, we bring you Gemini 3, The Definitive Guide to Agentic AI, and a paper on Depth Anything 3: Recovering the Visual Space from Any Views.
You may also enjoy GPT-5.1, Code execution with MCP: building more efficient AI agents, a paper on MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
A new era of intelligence with Gemini 3
Google releases Gemini 3 Pro with breakthrough reasoning scores, PhD-level performance on benchmarks, and enhanced multimodal and agentic coding capabilities.
xAI introduced Grok 4.1, which brings significant improvements to the real-world usability of Grok.
GPT-5.1: A smarter, more conversational ChatGPT
OpenAI releases GPT-5.1 with adaptive reasoning, improved conversational style, and enhanced customization options for ChatGPT users.
MLOps & LLMOps.
The Definitive Guide to Agentic AI: What AI Agents Actually Are and How to Build Them for Production
Discover the core principles behind truly agentic AI systems, how to build them for production, and the reasons they often fail at scale.
Qdrant 1.16 - Tiered Multitenancy & Disk-Efficient Vector Search
A technical update announcing Qdrant 1.16, which introduces Tiered Multitenancy, the ACORN search algorithm, and Inline Storage for disk-efficient, high-performance vector search.
Building an Interactive AI Agent for Lightning-Fast Machine Learning Tasks
A technical blog post about building a data science agent using Nemotron Nano-9B-v2 and CUDA-X libraries, delivering massive 3x to 43x speedups for ML experimentation.
Code execution with MCP: building more efficient AI agents \ Anthropic
An article detailing how adopting code execution with the Model Context Protocol (MCP) reduces token consumption and increases efficiency for AI agents managing hundreds of tools.
Real-time streaming for AI models and agents in OpenSearch
A blog post launching experimental real-time streaming capabilities in OpenSearch 3.3 via the Predict Stream and Execute Stream Agent APIs.
Learning
Gemini 3 Prompting: Best Practices for General Usage
An instructional guide providing best practices for prompting Gemini 3 Pro, focusing on core principles like precise instructions, structured XML/Markdown tagging, and more.
Mapping LLMs with Sparse Autoencoders
An explainer describing Sparse Autoencoders (SAEs) as a technique to map LLM activations into monosemantic, interpretable features, allowing researchers to ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.