To deploy models on devices such as the iPhone, you often need to optimize the models to use less storage space, reduce power consumption, and reduce latency during inference. For an overview, see Optimizing Models Post-Training (Compressing ML Program Weights and Compressing Neural Network Weights).


Compression for PyTorch models:

Core ML

Compression for Core ML models: