Optimizers
To deploy models on devices such as the iPhone, you often need to optimize the models to use less storage space, reduce power consumption, and reduce latency during inference. For an overview, see Optimizing Models Post-Training (Compressing ML Program Weights and Compressing Neural Network Weights).
Post-Training Compression
Post-training compression for Core ML models:
Training-Time Compression
Training-time compression for PyTorch models: