To deploy models on devices such as the iPhone, you often need to optimize the models to use less storage space, reduce power consumption, and reduce latency during inference. For an overview, see Optimizing Models Post-Training (Compressing ML Program Weights and Compressing Neural Network Weights).

Post-Training Compression

Post-training compression for Core ML models:

Training-Time Compression

Training-time compression for PyTorch models: