← Back to Library

Deep Learning Weekly: Issue 422

This week in deep learning, we bring you The Ultimate Guide to LLM Evaluation: Metrics, Methods & Best Practices, How to Train an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs, and a paper on Hierarchical Reasoning Model.

You may also enjoy OpenAI's GPT-5-Codex, Writing effective tools for AI agents—using AI agents, a paper on From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Industry

Announcing Agent Payments Protocol (AP2)

Google announced the Agent Payments Protocol (AP2), an open protocol developed to securely initiate and transact agent-led payments across platforms.

Introducing upgrades to Codex | OpenAI

OpenAI released GPT‑5-Codex—a version of GPT‑5 further optimized for agentic coding in Codex

Cohere opens Paris office as EMEA hub

Cohere expands their presence in Europe with the official launch of a Paris office, which will serve as the central hub for operations across Europe, the Middle East, and Africa (EMEA).

What will AI look like in 2030?

A report from Epoch AI that examines infrastructure implications and future AI capabilities if AI scaling persists to 2030.

Workday acquires Sana Labs for $1.1B to upgrade agentic AI work experiences

Workday announced the acquisition of Sana Labs, an AI company offering enterprise knowledge and employee training tools, for about $1.1 billion.

MLOps & LLMOps

Writing effective tools for AI agents—using AI agents \ Anthropic

An article outlining techniques and principles for writing effective tools for AI agents, emphasizing evaluation and agent-assisted optimization to boost performance in real-world tasks.

How to turn Claude Code into a domain specific coding agent

A blog post exploring experimental configurations to transform Claude Code into a domain-specific coding agent.

The Rise of Subagents

An explanatory blog post discussing the increasing adoption and architecture of subagents.

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

A technical article introducing the NVIDIA Run:ai Model Streamer, an open-source SDK designed to significantly reduce cold start latency for LLM inference.

Build and scale adoption of AI agents for education with Strands Agents, Amazon Bedrock AgentCore, and LibreChat

A practical blog post demonstrating how to build and scale sophisticated AI agents for education using Strands Agents, Amazon Bedrock AgentCore, and LibreChat.

Learning

The Ultimate ...

Read full article on Deep Learning Weekly →