← Back to Library

Deep Learning Weekly: Issue 425

This week in deep learning, we bring you RTEB: A New Standard for Retrieval Evaluation, Building Multi-Agent Systems with Crew AI and Weaviate, and a paper on MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use.

You may also enjoy Introducing the Gemini 2.5 Computer Use model, Petri: An open-source auditing tool to accelerate AI safety research, a paper on TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


Industry

Introducing the Gemini 2.5 Computer Use model

The DeepMind team released Gemini 2.5 Computer Use, a new specialized model built on Gemini 2.5 Pro, capable of interacting with user interfaces.

IBM releases Granite 4 series of Mamba-Transformer language models

IBM open-sourced Granite 4, a series of language models that combines elements of two different neural network architectures.

Google launches its AI vibe-coding app Opal in 15 more countries

Google is expanding access to Opal, an AI vibe-coding app which lets you create mini web apps using text prompts, to 15 more countries.

MLOps & LLMOps

Give Your AI Agents Deep Understanding — Creating the LLMS.txt with a Multi-Agent ADK Solution

An article about designing and building a multi-agent system using Google’s ADK that automatically generates llms.txt files to give AI agents a structured understanding of code repositories.

Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines

An article about how Pinterest extended Ray across their ML infrastructure with native data transformations, Iceberg bucket joins, and data persistence to accelerate feature development and reduce costs.

Building Multi-Agent Systems with Crew AI and Weaviate

A technical blog post about building complex multi-agent systems using CrewAI for orchestration, leveraging Weaviate for enhanced knowledge retrieval and collaboration.

Learning

Petri: An open-source auditing tool to accelerate AI safety research

An open-source blog post about Petri (Parallel Exploration Tool for Risky Interactions), an auditing framework that uses AI agents to accelerate safety research by testing misaligned model behaviors.

Practical LLM Security Advice from the NVIDIA AI Red Team

A security blog post sharing practical advice to mitigate common LLM vulnerabilities, including remote code execution, RAG access control issues, and active content rendering.

From Word2Vec to LLM2Vec: How to Choose the Right Embedding Model for RAG

An in-depth ...

Read full article on Deep Learning Weekly →