Training Your Own GPT Models: A Case Study

The Unreasonable Effectiveness of Scale

5 minute read

Published: November 17, 2025

Scaling laws describe the relationship between a model’s performance and the scale of three key ingredients: the number of model parameters, the size of the dataset, and the amount of computational power used for training. The core finding is that as you increase these resources, the model’s performance improves in a predictable, power-law fashion. Read more

A Technical Deep Dive into Exploding Gradients

5 minute read

Published: November 17, 2025

I remember one of the experiences I had duing my MS in Computer Science at Georgia Tech while working on a CNN for protein data. I was feeding raw protein data as an image, with pixel values in the standard 0-255 range, directly into the network. My model’s accuracy was stuck below 20%, and the loss was oscillating wildly. After hours of debugging, I traced the issue to its source: I had neglected to normalize my input data, leading to a classic case of “exploding gradients.” Read more

Why Randomized Optimization Needs Quantum Computing

5 minute read

Published: November 03, 2025

Randomized optimization algorithms like Genetic Algorithms (GA), Simulated Annealing (SA), and Randomized Hill Climbing (RHC) are powerful tools for solving problems where traditional gradient-based methods fail. These “black-box” problems are common in fields like logistics, engineering design, and machine learning, where the optimization landscape is complex, non-differentiable, or riddled with local minima. Read more

Why Backprop Isn’t Magic: The Challenge of Local Minima

7 minute read

Published: April 14, 2025

Backpropagation is the cornerstone algorithm powering much of the deep learning revolution. Coupled with gradient descent, it allows us to train incredibly complex neural networks on vast datasets. However, it’s not a silver bullet. One of the fundamental challenges that can prevent backpropagation from finding the best possible solution is the presence of local minima in the optimization landscape. Read more

Jethro Odeyemi

Training Your Own GPT Models: A Case Study

Share on

You May Also Enjoy

The Unreasonable Effectiveness of Scale

A Technical Deep Dive into Exploding Gradients

Why Randomized Optimization Needs Quantum Computing

Why Backprop Isn’t Magic: The Challenge of Local Minima