Performance#

Since palettization reduces the size of each weight value, the amount of data to be moved is reduced during prediction. This can lead to benefits in memory-bottlenecked models. Note that this latency advantage is available only when palettized weights are loaded and are decompressed “just in time” of computation. Starting with iOS17/macOS14, this is more likely to happen for models running primarily on the Neural Engine backend.

For the per_grouped_channel palettization mode added in iOS18/macOS15, you may see a drop in runtime performance as the number of LUTs used to represent a weight tensor is increased. Typically, group size of 8 or 16 have been seen to give good accuracy, while still giving a speed-up over uncompressed model.

Performance Benchmarks#

In the table below, we provide runtime performance benchmarks on several models, palettized using coremltools.optimize APIs.

Methodology#

The training time compressed models were obtained by fine-tuning the float32 PyTorch models with weights initialized from the checkpoints linked in the Model Info table, and using methods from coremltools.optimize.torch to perform compression. The datasets used for fine-tuning the models are also linked in the same table, along with the accuracy metric being reported. We used fine-tuning recipes which are commonly used in literature for the task at hand, and standard data augmentations.

Similarly, the post training compressed models were obtained by compressing the converted float16 Core ML models, with pre-trained weights, using methods from coremltools.optimize.coreml module.

The trained and compressed models and the coremltools.optimize.torch config files used for compression can be downloaded by clicking the respective links embedded in the model and config names.

The latency numbers were captured using the Xcode Performance tab, using the median statistic. Compute unit selection is all unless otherwise noted. The latency numbers are sensitive to the device state, and may vary depending on the device state and build versions.

  • Device: iPhone 14 Pro (A16), unless otherwise mentioned

  • iOS build: iOS17

  • Xcode : Xcode 15

Model Info#

Model Name

Task

Pre-trained Weights

Dataset

Accuracy Metric

MobileNetv2-1.0

Image Classification

Torchvision

ImageNet

Top-1 Accuracy (%)

MobileNetv3-small

Image Classification

Torchvision

ImageNet

Top-1 Accuracy (%)

ResNet50

Image Classification

Torchvision

ImageNet

Top-1 Accuracy (%)

MobileViTv2-1.0

Image Classification

cvnets

ImageNet

Top-1 Accuracy (%)

CenterNet (ResNet34 backbone)

Object Detection

Torchvision backbone

MS-COCO

mAP

Results#

Model Name

Config

Optimization Algorithm

Compression Ratio

Latency in ms (per batch)

MobileNetv2-1.0

Float16

n/a

1.0

0.48

MobileNetv2-1.0

2 bit

Differentiable K-Means

5.92

0.47

MobileNetv2-1.0

4 bit

Differentiable K-Means

3.38

0.45

MobileNetv2-1.0

6 bit

K-Means

2.54

0.48

MobileNetv2-1.0

8 bit

K-Means

1.97

0.45

MobileNetv3-small

Float16

n/a

1.0

0.13

MobileNetv3-small

2 bit

Differentiable K-Means

5.82

0.13

MobileNetv3-small

4 bit

Differentiable K-Means

3.47

0.13

MobileNetv3-small

6 bit

K-Means

2.6

0.13

MobileNetv3-small

8 bit

K-Means

1.93

0.13

ResNet50

Float16

n/a

1.0

1.52

ResNet50

2 bit

Differentiable K-Means

7.63

1.43

ResNet50

4 bit

Differentiable K-Means

3.9

1.41

ResNet50

6 bit

K-Means

2.65

1.37

ResNet50

8 bit

K-Means

1.99

1.4

CenterNet (ResNet34 backbone)

Float16

n/a

1.0

6.85

CenterNet (ResNet34 backbone)

2 bit

Differentiable K-Means

7.71

6.37

CenterNet (ResNet34 backbone)

4 bit

Differentiable K-Means

3.94

6.67

CenterNet (ResNet34 backbone)

6 bit

K-Means

2.65

6.71

CenterNet (ResNet34 backbone)

8 bit

K-Means

2.0

6.85