Deep Learning Weekly: Issue 425
This week in deep learning, we bring you RTEB: A New Standard for Retrieval Evaluation, Building Multi-Agent Systems with Crew AI and Weaviate, and a paper on MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use.
You may also enjoy Introducing the Gemini 2.5 Computer Use model, Petri: An open-source auditing tool to accelerate AI safety research, a paper on TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing the Gemini 2.5 Computer Use model
The DeepMind team released Gemini 2.5 Computer Use, a new specialized model built on Gemini 2.5 Pro, capable of interacting with user interfaces.
IBM releases Granite 4 series of Mamba-Transformer language models
IBM open-sourced Granite 4, a series of language models that combines elements of two different neural network architectures.
Google launches its AI vibe-coding app Opal in 15 more countries
Google is expanding access to Opal, an AI vibe-coding app which lets you create mini web apps using text prompts, to 15 more countries.
MLOps & LLMOps
Give Your AI Agents Deep Understanding — Creating the LLMS.txt with a Multi-Agent ADK Solution
An article about designing and building a multi-agent system using Google’s ADK that automatically generates llms.txt files to give AI agents a structured understanding of code repositories.
Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines
An article about how Pinterest extended Ray across their ML infrastructure with native data transformations, Iceberg bucket joins, and data persistence to accelerate feature development and reduce costs.
Building Multi-Agent Systems with Crew AI and Weaviate
A technical blog post about building complex multi-agent systems using CrewAI for orchestration, leveraging Weaviate for enhanced knowledge retrieval and collaboration.
Learning
Petri: An open-source auditing tool to accelerate AI safety research
An open-source blog post about Petri (Parallel Exploration Tool for Risky Interactions), an auditing framework that uses AI agents to accelerate safety research by testing misaligned model behaviors.
Practical LLM Security Advice from the NVIDIA AI Red Team
A security blog post sharing practical advice to mitigate common LLM vulnerabilities, including remote code execution, RAG access control issues, and active content rendering.
From Word2Vec to LLM2Vec: How to Choose the Right Embedding Model for RAG
An in-depth ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.