PitchHut
Log in / Sign up
mixed-precision-from-scratch
8 views
Unlocking the Power of Mixed Precision Training with CUDA.
Pitch

Explore mixed precision training from scratch in this educational repo. Dive into the inner workings of Tensors and CUDA as you witness firsthand the performance boost on a 2-layer MLP. Compare single precision to mixed precision with easy code execution and detailed metrics, enriching your understanding of modern deep learning techniques.

Description

Boost your deep learning capabilities with mixed-precision-from-scratch, an insightful educational repository designed for exploring the nuances of mixed precision training. This project provides a clear, hands-on demonstration of how mixed precision can accelerate the training process of a simple 2-layer MLP (multilayer perceptron) by utilizing CUDA for optimized matrix multiplication.

In this repository, you'll find a straightforward implementation to compare training in single precision versus mixed precision. Just run the following commands to observe the differences:

python train.py false  # Single precision training
ython train.py true   # Mixed precision training

Here’s an example of what to expect:

$ python train.py false
device: cuda, mixed precision training: False (torch.float32)
model memory: 26.05 MB
act/grad memory: 1100.45 MB
total memory: 1126.50 MB
1: loss 2.327, time: 139.196ms
...
avg: 16.280ms
$ python train.py true
device: cuda, mixed precision training: True (torch.float16)
model memory: 39.08 MB
act/grad memory: 563.25 MB
total memory: 602.33 MB
1: loss 2.328, time: 170.039ms
...
avg: 8.369ms

This project is not just about code but also about understanding the principles behind mixed precision training. For deeper insights, make sure to check out my blog, where I unpack the concepts further and share valuable information about optimizing your model training efficiently.