Deep Learning Weekly: Issue 418

By Various · Deep Learning Weekly ·Aug 20, 2025 · 6 min read

This week in deep learning, we bring you Gemma 3 270M, Best Practices for Building Agentic AI Systems: What Actually Works in Production, and a paper on GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.

You may also enjoy NVIDIA Nemotron Nano 2, Beyond billion-parameter burdens: Unlocking data synthesis with a conditional generator, a paper on Technical Report: Evaluating Goal Drift in Language Model Agents, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

Google introduced Gemma 3 270M, a compact, 270-million parameter model designed for task-specific fine-tuning.

NVIDIA Nemotron Nano 2

The NVIDIA team released the NVIDIA Nemotron Nano 2 family of accurate and efficient hybrid Mamba-Transformer reasoning models.

DINOv3: Self-supervised learning for vision at unprecedented scale

Meta released DINOv3, a generalist, state-of-the-art computer vision model trained with self-supervised learning that produces superior high-resolution visual features.

LFM2-VL: Efficient Vision-Language Models

Liquid AI released LFM2-VL, their first series of vision-language foundation models.

Researchers glimpse the inner workings of protein language models

In a new study, MIT researchers use sparse autoencoders to determine what features a protein language model takes into account when making predictions.

MLOps & LLMOps

Best Practices for Building Agentic AI Systems: What Actually Works in Production

A practical article about best practices for building and implementing agentic AI systems in production, covering architectural patterns, communication, error handling, and performance optimization.

Hands-On with VDBBench: Benchmarking Vector Databases for POCs That Match Production

A practical tutorial on evaluating vector databases for production-matching Proof of Concepts using the VDBBench tool with custom, real-world datasets.

Learning

Beyond billion-parameter burdens: Unlocking data synthesis with a conditional generator

A blog post from Google detailing CTCL, a novel framework for generating privacy-preserving synthetic data using a lightweight 140M-parameter model, bypassing the need for billion-scale LLM fine-tuning and domain-specific prompt engineering.

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

A comprehensive guide on building and scaling production-ready custom CUDA kernels for PyTorch.

Accelerating MoE’s with a Triton Persistent Cache-Aware Grouped GEMM Kernel

A post on optimizing Triton BF16 Grouped GEMM kernel for running training and inference on Mixture-of-Experts (MoE) models, such as DeepSeekv3.

Libraries & Code

huggingface/aisheets

Build, enrich, and transform datasets using ...

Read full article on Deep Learning Weekly →

This excerpt is provided for preview purposes. Full article content is available on the original publication.