Efficient LLMs at Scale: My NeurIPS Week in KV Caches, Spec Decoding, and FP4
Notes on RNJ-1, K2-V2, Devstral 2, and GLM-4.6V
Eagle 3 Speculators: When To Use Them?
Mistral Large 3: Not a Reasoning Model
Quantizing Olmo 3: Most Efficient and Accurate Formats
Scaling RL and Self-Verifiable Reasoning: INTELLECT-3 and DeepSeekMath-V2
Accelerate Models with Quantization: Recipes for NVFP4, GPTQ, AWQ, SmoothQuant, AutoRound, and FP8
Olmo 3 Is Here!
Best GPUs Under $1,500 for AI: Should You Upgrade?
The Limits of GRPO-like Methods for Reinforcement Learning
Unsloth's Quantization-Aware Training (QAT) vs Post-Training Quantization (PTQ) for Small Models
BF16 vs FP16 for Reinforcement Learning: Where Are We?
Advanced LoRA Fine-Tuning: How to Pick LoRA, QLoRA, DoRA, PiSSA, OLoRA, EVA, and LoftQ for LLMs
MiniMax M2 and Kimi-Linear: Why Full Attention Still Wins
Generate Better Synthetic Datasets with a "User" LLM
The Weekly Kaitchup #115
Qwen3-VL Fine-Tuning on Your Computer
DGX Spark: Use It for Fine-Tuning
Choosing a GGUF Model: K-Quants, I-Quants, and Legacy Formats
Tiny Recursive Models for Very Specific Problems
Why Increasing Batch Size Doesn’t Always Speed Up Training