← Back to Library

The Anatomy of the Least Squares Method, Part Two

Hey! It’s Tivadar from The Palindrome.

is here to continue our deep dive into the least squares method, the bread and butter of data science and machine learning.

Without further ado, I’ll pass the mic to him.

Enjoy!

Cheers,
Tivadar


By the end of this post series, you will be confident about understanding, applying, and interpreting regression models (general linear models) that are solved using the famous least-squares algorithm. Here’s a breakdown of the post series:

Post 1 (the previous post): Theory and math. If you haven’t read this post yet, please do so before reading this one!

Post 2 (this post): Explorations in simulations. You’ll learn how to simulate data to supercharge your intuition for least-squares, how to visualize the results, and how to run experiments. You’ll also learn about residuals and overfitting.

Post 3: real-data examples. Simulated data are great because you have full control over the data characteristics and noise, but there’s no substitute for real data. And that’s what you’ll experience in this post. I’ll also teach you how to use the Python statsmodels library.

Post 4: modeling GPT activations. This post will be fun and fascinating. We’ll dissect OpenAI’s LLM GPT2, the precursor to its state-of-the-art ChatGPT. You’ll learn more about least-squares and also about LLM mechanisms.

Following along with code

I’m a huge fan of learning math through coding. You can learn a lot of math with a bit of code.

That’s why I have Python notebook files that accompany my posts. The essential code bits are pasted directly into this post, but the complete code files, including all the code for visualization and additional explorations, are here on my GitHub.

If you’re more interested in the theory/concepts, then it’s completely fine to ignore the code and just read the post. But if you want a deeper level of understanding and intuition — and the tools to continue exploring and applying the analyses to your own projects — then I strongly encourage following along with the code while reading this post.

Here’s a video where I explain how to get my code from GitHub and follow along using Google Colab. It’s free (you need a Google account, but who doesn’t have one??) and runs in your browser, so you don’t need to install anything.

Why you should use simulated data when learning machine-learning

Here’s why I love teaching data ...

Read full article on The Palindrome →