Optimizers
To deploy models on devices such as the iPhone, you often need to optimize the models to use less storage space, reduce power consumption, and reduce latency during inference. For an overview, see Optimizing Models Post-Training (Compressing ML Program Weights and Compressing Neural Network Weights).
PyTorch
Compression for PyTorch models:
Core ML
Compression for Core ML models: