← Back to Library

Deep Learning Weekly: Issue 415

This week in deep learning, we bring you Evaluating Grok 4’s Math Capabilities, Introducing Letta Filesystem, and a paper on ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution.

You may also enjoy DeepMind's AlphaEarth Foundations, Building and evaluating alignment auditing agents, a paper on SensorLM: Learning the Language of Wearable Sensors, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Industry

AlphaEarth Foundations helps map our planet in unprecedented detail

DeepMind’s new AI model integrates petabytes of Earth observation data to generate a unified data representation that revolutionizes global mapping and monitoring.

Introducing Letta Filesystem

The Letta team announced Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

New algorithms enable efficient machine learning with symmetric data

A new study by MIT researchers shows the first method for machine learning with symmetry that is provably efficient in terms of both the amount of computation and data needed.

Thunderforge Brings AI Agents to Wargames

The US Department of Defense is leading an experimental project, Thunderforge, to build a custom agentic system for critiquing war plans across different military domains.

Chinese startup Z.ai releases cost-efficient GLM-4.5 reasoning model

Z.ai, a Chinese startup, open-sourced GLM-4.5, a reasoning model that it claims is more cost-efficient than DeepSeek’s R1.

AI-native clinical documentation startup Ambience Healthcare raises $243M

Ambience Healthcare, a provider of AI-powered clinical documentation tools, closed on $243 million in Series C funding.

MLOps & LLMOps

Build workflows with Langchain and Weaviate v3

A practical blog post explaining how to build scalable AI workflows by combining LangChain's orchestration layer with Weaviate's vector search.

Traditional RAG vs. Agentic RAG—Why AI Agents Need Dynamic Knowledge to Get Smarter

A technical blog post comparing traditional RAG with agentic RAG for AI agents, highlighting the need for dynamic knowledge.

Learning

Building and evaluating alignment auditing agents

A comprehensive blog post from Anthropic about developing and evaluating LLM-based auditing agents that autonomously assess alignment issues in frontier models like Claude 4.

Evaluating Grok 4’s Math Capabilities

A report about Grok 4's mathematical strengths and weaknesses, including its state-of-the-art performance in medium-hard high school math competitions and its utility for literature search.

Arc Virtual Cell Challenge: A Primer

A ...

Read full article on Deep Learning Weekly →