Chapters (55)
- 0:00Introduction to the course
- 0:15Llama 4 Overview and Ranking
- 0:26Course Prerequisites
- 0:43Course Approach for Beginners
- 1:27Why Code Llama from Scratch?
- 2:20Understanding LLMs and Text Generation
- 3:11How LLMs Predict the Next Word
- 4:13Probability Distribution of Next Words
- 5:11The Role of Data in Prediction
- 5:51Probability Distribution and Word Prediction
- 8:01Sampling Techniques
- 8:22Greedy Sampling
- 9:09Random Sampling
- 9:52Top K Sampling
- 11:02Temperature Sampling for Controlling Randomness
- 12:56What are Tokens?
- 13:52Tokenization Example: "Hello world"
- 14:30How LLMs Learn Semantic Meaning
- 15:23Token Relationships and Context
- 17:17The Concept of Embeddings
- 21:37Tokenization Challenges
- 22:15Large Vocabulary Size
- 23:28Handling Misspellings and New Words
- 28:42Introducing Subword Tokens
- 30:16Byte Pair Encoding (BPE) Overview
- 34:11Understanding Vector Embeddings
- 36:59Visualizing Embeddings
- 40:50The Embedding Layer
- 45:31Token Indexing and Swapping Embeddings
- 48:10Coding Your Own Tokenizer
- 49:41Implementing Byte Pair Encoding
- 52:13Initializing Vocabulary and Pre-tokenization
- 55:12Splitting Text into Words
- 1:01:57Calculating Pair Frequencies
- 1:06:35Merging Frequent Pairs
- 1:10:04Updating Vocabulary and Tokenization Rules
- 1:13:30Implementing the Merges
- 1:19:52Encoding Text with the Tokenizer
- 1:26:07Decoding Tokens Back to Text
- 1:33:05Self-Attention Mechanism
- 1:37:07Query, Key, and Value Vectors
- 1:40:13Calculating Attention Scores
- 1:41:50Applying Softmax
- 1:43:09Weighted Sum of Values
- 1:45:18Self-Attention Matrix Operations
- 1:53:11Multi-Head Attention
- 1:57:55Implementing Self-Attention
- 2:10:40Masked Self-Attention
- 2:37:09Rotary Positional Embeddings (RoPE)
- 2:38:08Understanding Positional Information
- 2:40:58How RoPE Works
- 2:49:03Implementing RoPE
- 2:56:47Feed-Forward Networks (FFN)
- 2:58:50Linear Layers and Activations
- 3:02:19Implementing FFN
Show the creator's full description
This course is a guide to understanding and implementing Llama 4. @vukrosic will teach you how to code Llama 4 from scratch.
Code and presentations: https://github.com/vukrosic/courses
Code DeepSeek V3 From Scratch: https://youtu.be/5avSMc79V-w
⭐️ Contents ⭐️
- 0:00:00 Introduction to the course
- 0:00:15 Llama 4 Overview and Ranking
- 0:00:26 Course Prerequisites
- 0:00:43 Course Approach for Beginners
- 0:01:27 Why Code Llama from Scratch?
- 0:02:20 Understanding LLMs and Text Generation
- 0:03:11 How LLMs Predict the Next Word
- 0:04:13 Probability Distribution of Next Words
- 0:05:11 The Role of Data in Prediction
- 0:05:51 Probability Distribution and Word Prediction
- 0:08:01 Sampling Techniques
- 0:08:22 Greedy Sampling
- 0:09:09 Random Sampling
- 0:09:52 Top K Sampling
- 0:11:02 Temperature Sampling for Controlling Randomness
- 0:12:56 What are Tokens?
- 0:13:52 Tokenization Example: "Hello world"
- 0:14:30 How LLMs Learn Semantic Meaning
- 0:15:23 Token Relationships and Context
- 0:17:17 The Concept of Embeddings
- 0:21:37 Tokenization Challenges
- 0:22:15 Large Vocabulary Size
- 0:23:28 Handling Misspellings and New Words
- 0:28:42 Introducing Subword Tokens
- 0:30:16 Byte Pair Encoding (BPE) Overview
- 0:34:11 Understanding Vector Embeddings
- 0:36:59 Visualizing Embeddings
- 0:40:50 The Embedding Layer
- 0:45:31 Token Indexing and Swapping Embeddings
- 0:48:10 Coding Your Own Tokenizer
- 0:49:41 Implementing Byte Pair Encoding
- 0:52:13 Initializing Vocabulary and Pre-tokenization
- 0:55:12 Splitting Text into Words
- 1:01:57 Calculating Pair Frequencies
- 1:06:35 Merging Frequent Pairs
- 1:10:04 Updating Vocabulary and Tokenization Rules
- 1:13:30 Implementing the Merges
- 1:19:52 Encoding Text with the Tokenizer
- 1:26:07 Decoding Tokens Back to Text
- 1:33:05 Self-Attention Mechanism
- 1:37:07 Query, Key, and Value Vectors
- 1:40:13 Calculating Attention Scores
- 1:41:50 Applying Softmax
- 1:43:09 Weighted Sum of Values
- 1:45:18 Self-Attention Matrix Operations
- 1:53:11 Multi-Head Attention
- 1:57:55 Implementing Self-Attention
- 2:10:40 Masked Self-Attention
- 2:37:09 Rotary Positional Embeddings (RoPE)
- 2:38:08 Understanding Positional Information
- 2:40:58 How RoPE Works
- 2:49:03 Implementing RoPE
- 2:56:47 Feed-Forward Networks (FFN)
- 2:58:50 Linear Layers and Activations
- 3:02:19 Implementing FFN
And if you want to code DeepSeek V3 from scratch, here's the Full Course: https://youtu.be/5avSMc79V-w
❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp
🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news
Description and video by freeCodeCamp.org. This page is an independent companion view; the video is embedded from YouTube.