Explore mixed precision training from scratch in this educational repo. Dive into the inner workings of Tensors and CUDA as you witness firsthand the performance boost on a 2-layer MLP. Compare single precision to mixed precision with easy code execution and detailed metrics, enriching your understanding of modern deep learning techniques.
Boost your deep learning capabilities with mixed-precision-from-scratch, an insightful educational repository designed for exploring the nuances of mixed precision training. This project provides a clear, hands-on demonstration of how mixed precision can accelerate the training process of a simple 2-layer MLP (multilayer perceptron) by utilizing CUDA for optimized matrix multiplication.
In this repository, you'll find a straightforward implementation to compare training in single precision versus mixed precision. Just run the following commands to observe the differences:
python train.py false # Single precision training
ython train.py true # Mixed precision training
Here’s an example of what to expect:
$ python train.py false
device: cuda, mixed precision training: False (torch.float32)
model memory: 26.05 MB
act/grad memory: 1100.45 MB
total memory: 1126.50 MB
1: loss 2.327, time: 139.196ms
...
avg: 16.280ms
$ python train.py true
device: cuda, mixed precision training: True (torch.float16)
model memory: 39.08 MB
act/grad memory: 563.25 MB
total memory: 602.33 MB
1: loss 2.328, time: 170.039ms
...
avg: 8.369ms
This project is not just about code but also about understanding the principles behind mixed precision training. For deeper insights, make sure to check out my blog, where I unpack the concepts further and share valuable information about optimizing your model training efficiently.