gilesthomas.com

Site RSSBlogs

Back

Latest posts

Writing an LLM from scratch, part 32h -- Interventions: full fat float32
Apr 03, 2026
This is the last of the interventions I'm trying out to see if I can improve the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Back when I did my first training run for a base model, on my local RTX 3090, I used two optimisations: Setting the 32-bit floating point matrix multiplication precision
Writing an LLM from scratch, part 32i -- Interventions: what is in the noise?
Apr 07, 2026
Towards the end of last year, I trained a 163M-parameter GPT-2-style model from scratch on my local RTX 3090, using code based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". The result was a pretty decent little model, but it wasn't as good as the original GPT-2-small, despite having more parameters (because it wasn't using weight-tying). Specifically: on a particular t
Writing an LLM from scratch, part 32j -- Interventions: trying to train a better model in the cloud
Apr 09, 2026
Since early February, I've been trying various interventions on a 163M-parameter GPT-2-style model that I trained from scratch on my local RTX 3090, using code based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". My original model got a loss of 3.944 on my test set, while the original GPT-2 weights got 3.500 on the same dataset. I wanted to see if I could close that ga
Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation
Apr 15, 2026
I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". I've trained various versions of it in the cloud to work out which interventions to the model and training code had the best effects on the loss it gets on a specific test dataset, and now I wanted to do a training run locally to match the best of those. For that, I wanted
How an LLM becomes more coherent as we train it
Apr 17, 2026
I remember finding it interesting when, back in 2015, Andrej Karpathy posted about RNNs and gave an example of how their output improves over the course of a training run. What might that look like for a (relatively) modern transformers-based LLM? I recently trained a GPT-2-small-style LLM, with 163 million parameters, on about 3.2 billion tokens (that's about 12.8 GiB of text) from the Hugging Fa
Writing an LLM from scratch, part 32l -- Interventions: updated instruction fine-tuning results
Apr 20, 2026
I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)", and have tried a bunch of different things to see if I could get it to approach the quality of the original OpenAI GPT-2-small, measured in terms of loss on a held-back test dataset. After working through them, in my last post, I managed to train one that was almost (if no
Writing an LLM from scratch, part 32m -- Interventions: conclusion
Apr 21, 2026
Last November, when I finished the main body of "Build a Large Language Model (from Scratch)", I set myself a number of follow-on goals. One was "training the full GPT-2 base model myself". I've reached the end of that journey, with a model that is almost -- if not quite -- as good as GPT-2 small, trained in 44 hours on my own machine, so I thought it would be worth summarising how it went. In Dec
Writing an LLM from scratch, part 33 -- what I learned from finally getting round to the appendices
Apr 22, 2026
After finishing the main body of "Build a Large Language Model (from Scratch)", I set myself three follow-on goals. The first was training a full GPT-2-small-style base model myself. That was reasonably easy to do but unlocked a bunch of irresistible side quests; having finally got to the end of those, it's time to move on to the others: reading through the book's appendices, and building my own
10Gb/s Ethernet: what I had to (re)learn
Apr 28, 2026
My ISP recently started offering a 10Gb option, and my "shiny new thing!" Pavlovian response immediately kicked in. So of course, I had to upgrade the wired networking in my home -- which meant I had to learn a few things to get it all working, and relearn a bunch of stuff I'd forgotten over the years. Wired networking for home and small offices hasn't really moved forward that much in the last 2
10Gb/s Ethernet: what I actually did to get it working in my home
Apr 29, 2026
Having learned enough about 10Gb/s Ethernet to be comfortable about setting it up in my house, it was time to bite the bullet: order it from the ISP, buy some kit, and get started. I already had 2.5Gb/s working. The apartment has structured cabling -- each room has one or more RJ45 sockets in the wall, and there's a patch panel downstairs by our front door that has a matching patch socket for eac