New LLM Pre-training and Post-training Paradigms

By Sebastian Raschka · Ahead of AI ·Aug 17, 2024 · 23 min read

The development of large language models (LLMs) has come a long way, from the early GPT models to the sophisticated open-weight LLMs we have today. Initially, the LLM training process focused solely on pre-training, but it has since expanded to include both pre-training and post-training. Post-training typically encompasses supervised instruction fine-tuning and alignment, which was popularized by ChatGPT.

Training methodologies have evolved since ChatGPT was first released. In this article, I review the latest advancements in both pre-training and post-training methodologies, particularly those made in recent months.

An overview of the LLM development and training pipeline, with a focus on new pre-training and post-training methodologies discussed in this article

There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in practice is to look at the pre-training and post-training pipelines of the most recent state-of-the-art models. Luckily, four major new LLMs have been released in the last months, accompanied by relatively detailed technical reports.

In this article, I focus on the pre-training and post-training pipelines of the following models:

Alibaba's Qwen 2
Apple Intelligence Foundation Language Models
Google's Gemma 2
Meta AI's Llama 3.1

These models are presented in order based on the publication dates of their respective technical papers on arXiv.org, which also happens to align with their alphabetical order.

This article is a passion project that I created in my free time and over the weekends. If you find it valuable and would like to support my work, please consider purchasing a copy of my books and recommending them to your colleagues. Your review on Amazon would also be greatly appreciated!

Build a Large Language Model (from Scratch), Machine Learning Q and AI, and Machine Learning with PyTorch and Scikit-Learn

Build a Large Language Model (from Scratch) is a highly focused book dedicated to coding LLMs from the ground up in PyTorch, covering everything from pre-training to post-training—arguably the best way to truly understand LLMs.
Machine Learning Q and AI is a great book for those who are already familiar with the basics; it dives into intermediate and advanced concepts covering deep neural networks, vision transformers, multi-GPU training paradigms, LLMs, and many more.
Machine Learning with PyTorch and Scikit-Learn is a comprehensive guide to machine learning, deep learning, and AI, offering a well-balanced mix of theory and practical code. It's the

...

Read full article on Ahead of AI →

This excerpt is provided for preview purposes. Full article content is available on the original publication.