Code Your Own Llama 4 LLM from Scratch – Full Course

Chapters (55)

Show the creator's full description

This course is a guide to understanding and implementing Llama 4. @vukrosic will teach you how to code Llama 4 from scratch. Code and presentations: https://github.com/vukrosic/courses Code DeepSeek V3 From Scratch: https://youtu.be/5avSMc79V-w ⭐️ Contents ⭐️ - 0:00:00 Introduction to the course - 0:00:15 Llama 4 Overview and Ranking - 0:00:26 Course Prerequisites - 0:00:43 Course Approach for Beginners - 0:01:27 Why Code Llama from Scratch? - 0:02:20 Understanding LLMs and Text Generation - 0:03:11 How LLMs Predict the Next Word - 0:04:13 Probability Distribution of Next Words - 0:05:11 The Role of Data in Prediction - 0:05:51 Probability Distribution and Word Prediction - 0:08:01 Sampling Techniques - 0:08:22 Greedy Sampling - 0:09:09 Random Sampling - 0:09:52 Top K Sampling - 0:11:02 Temperature Sampling for Controlling Randomness - 0:12:56 What are Tokens? - 0:13:52 Tokenization Example: "Hello world" - 0:14:30 How LLMs Learn Semantic Meaning - 0:15:23 Token Relationships and Context - 0:17:17 The Concept of Embeddings - 0:21:37 Tokenization Challenges - 0:22:15 Large Vocabulary Size - 0:23:28 Handling Misspellings and New Words - 0:28:42 Introducing Subword Tokens - 0:30:16 Byte Pair Encoding (BPE) Overview - 0:34:11 Understanding Vector Embeddings - 0:36:59 Visualizing Embeddings - 0:40:50 The Embedding Layer - 0:45:31 Token Indexing and Swapping Embeddings - 0:48:10 Coding Your Own Tokenizer - 0:49:41 Implementing Byte Pair Encoding - 0:52:13 Initializing Vocabulary and Pre-tokenization - 0:55:12 Splitting Text into Words - 1:01:57 Calculating Pair Frequencies - 1:06:35 Merging Frequent Pairs - 1:10:04 Updating Vocabulary and Tokenization Rules - 1:13:30 Implementing the Merges - 1:19:52 Encoding Text with the Tokenizer - 1:26:07 Decoding Tokens Back to Text - 1:33:05 Self-Attention Mechanism - 1:37:07 Query, Key, and Value Vectors - 1:40:13 Calculating Attention Scores - 1:41:50 Applying Softmax - 1:43:09 Weighted Sum of Values - 1:45:18 Self-Attention Matrix Operations - 1:53:11 Multi-Head Attention - 1:57:55 Implementing Self-Attention - 2:10:40 Masked Self-Attention - 2:37:09 Rotary Positional Embeddings (RoPE) - 2:38:08 Understanding Positional Information - 2:40:58 How RoPE Works - 2:49:03 Implementing RoPE - 2:56:47 Feed-Forward Networks (FFN) - 2:58:50 Linear Layers and Activations - 3:02:19 Implementing FFN And if you want to code DeepSeek V3 from scratch, here's the Full Course: https://youtu.be/5avSMc79V-w ❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp 🎉 Thanks to our Champion and Sponsor supporters: 👾 Drake Milly 👾 Ulises Moralez 👾 Goddard Tan 👾 David MG 👾 Matthew Springman 👾 Claudio 👾 Oscar R. 👾 jedi-or-sith 👾 Nattira Maneerat 👾 Justin Hual -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news

Description and video by freeCodeCamp.org. This page is an independent companion view; the video is embedded from YouTube.