A short-straight-visualized notebook course to build your own GPT. Yes, from scratch π£
Haven't expected 2023 will present the future which we dreamed since decades. I was grown up with the "toys" which could give some glimpse of how the AI looks like, which could generate real facial images if you've heard of thispersondoesnotexist.com and a couple of others.
But now, there is a hell, let me repeat a hell lot of generative tools out there popping up nearly every day. As a CS/DL student, it has always been a point to be curious about how the hack does this thing work.
Started with noobie questions:
- Can the chatGPT remember things?
- Oh, those are the weights, but how does it manage to create a perfect, grammatically correct sentence!?
- It also takes care of the CaPitaliZation, proper indentation if asked for the code etc...
It had to be explored in great depth. And that's where I found Andrej Karpathy π€
And this notebook course takes you on the journey to the GPT starting from the very first baby steps.
The Karpathy's series is fantastic, but I love things to be spelled out even more, and a cherry on top, if visualized. I am willing to refer to some specific topic in the future, and if I can't find out that, seeking through the video, then I get frustrated.
This notebook course implements all code and math understanding buildup with informal explanation that we understand; the simplest language possible, you know the bro's language, yo!? π and also we will keep building one step at a time π§
I would encourage you to use this course as material for practising your understanding and a place to refer back to some specifics in the future but use Andrej's series as a main guide.
π€ Each notebook is carefully crafted for to-the-point reading
β Darivations, expressions are explored, experimented and visualized in the dude's words. So, don't sweat off with the maths.
π Notebook has a lot of visuals to understand the flow, be it backpropagation, bug hunting or just visualizing the neural net.
π¨βπ» Code is there, code is there... and is reproducible.
π Clips to the specific portion of the lecture to refer to.
I mean, there's a lot there... just start exploring!!
Start with the very first lecture Micrograd
.
- Micrograd: Our first steps to the neural net, here we will explore the expressions at the atomic level and how slope or derivative is calculated. Will build a little classifier too.
- Makemore: The first version of our GPT. But not the GPT exactly because doesn't use the transformer. But this Makemore is the term used to name the model. The model is capable of generating human names. We will start with the simple bigram model, no ML/DL has been done yet, just you know, the first stepping stone.
- Makemore with NN: Here we will prove how the model can learn the relationship of the bigram model (manual model) built in the previous lecture.
- Improve on NN: This is the core. Here we will see the scrutinizers. How to diagnose the model, what went wrong, batchnorm and other exciting stuff when the model isn't learning or... learning wrong.
- Backprop Ninja: This will boost up your backpropagation confidence like hell. We will backpropagate through each layer, manually, spelled out, visualized, codized.
- WaveNet: A new architecture to train the model.
- GPT: Truely spelled out GPT. From the embeddings, to masked-multi-headed-self-attention. Visualized. Demystified. We will end up creating the GPT which actually completes the text together!
Andrej's last lecture is on GPT, but there is a lot to explore. So to be continued...
How Open Aiβs Andrej Karpathy Made One of the Best Tutorials in Deep Learning by - Usama Ahmed: He does a phenomenal job of reviewing Andrej's entire course, module by module. Must check out!