techniques
Quantization
Quantization is like shrinking a high‑resolution photo into a smaller file by reducing the number of colors, but for neural nets. It drops the precision of the numbers the model uses (e.g., from 32‑bit floats to 8‑bit integers) so the model runs faster and needs less memory. The trade‑off is a tiny dip in accuracy, but for many apps the speed gain is worth it.
Want to learn more about AI?
Peter Saddington has trained 17,000+ people on agile and AI. Let’s talk.
Work with Peter