Mixed-precision palettization with ResNet50

In this article we walk through the Mixed-Precision Compression workflow applied to ResNet50 weight palettization. We use 2/4/6-bit per-tensor palettization as our candidate configs, PSNR on logits as the sensitivity metric, and generate the recipe with the greedy approach targeting 4 bits-per-weight (BPW). See the linked page for definitions of each term and the three-stage workflow.

Model and dataset

We apply palettization to a pretrained ResNet50 (IMAGENET1K_V1 weights) from torchvision. The model has 54 conv and linear layers — these are the weights we palettize. We draw data samples from the ImageNet validation set: 2560 images for sensitivity computation and the full 50,000-image validation set for top-1 evaluation. All accuracy numbers reported below are measured using PyTorch with mps backend.

Baseline

The pretrained model at fp16 gives us a top-1 eval accuracy of 75.02%.

Uniform 4-bit palettization

The simplest compression strategy is to apply the same per-tensor 4-bit lookup table (LUT) to every layer. This gives us roughly a 4x reduction in model size going from 16-bit precision to 4-bit precision, plus a small overhead for storing the lookup table itself.

We build a uniform config by setting a single global_config that applies to every palettizable layer, then run k-means palettization and use the prepared model directly for evaluation.

import torch

from coreai_opt.palettization import (
    KMeansPalettizer,
    KMeansPalettizerConfig,
    ModuleKMeansPalettizerConfig,
    PalettizationSpec,
)
from coreai_opt.palettization.spec import PerTensorGranularity

cfg = KMeansPalettizerConfig(
    global_config=ModuleKMeansPalettizerConfig(
        op_state_spec={
            "weight": PalettizationSpec(n_bits=4, granularity=PerTensorGranularity()),
        },
    ),
)
palettizer = KMeansPalettizer(model, cfg)
palettized_model = palettizer.prepare(example_inputs=(torch.randn(1, 3, 224, 224),))

Top-1 accuracy on the eval set: 65.87%.

Mixed-precision compression

For this example we use 2/4/6-bit per-tensor palettization as our candidate configs for every layer.

Layer-wise sensitivity computation

To compress a single layer in isolation, we set global_config=None and add a module_name_configs entry that targets only that layer’s fully qualified name. For example, palettizing only conv1 at 2 bits:

cfg = KMeansPalettizerConfig(
    global_config=None,
    module_name_configs={
        "conv1": ModuleKMeansPalettizerConfig(
            op_state_spec={
                "weight": PalettizationSpec(
                    n_bits=2, granularity=PerTensorGranularity()
                ),
            },
        ),
    },
)

The fully qualified names used as module_name_configs keys (e.g. "conv1", "layer1.0.conv1", "fc") come from iterating model.named_modules() and keeping the modules supported for palettization.

For ResNet50 this produces 54 entries — 53 conv layers plus the final fc linear.

We run this for every (layer, candidate config) pair and score each candidate against the baseline with PSNR to yield a sensitivity table. The first five layers look like this:

Layer

Config

Size (KB)

Sensitivity (PSNR)

conv1

2-bit

2.31

22.79

conv1

4-bit

4.66

31.64

conv1

6-bit

7.14

44.27

layer1.0.conv1

2-bit

1.02

31.89

layer1.0.conv1

4-bit

2.06

44.98

layer1.0.conv1

6-bit

3.25

55.60

layer1.0.conv2

2-bit

9.02

36.42

layer1.0.conv2

4-bit

18.06

47.21

layer1.0.conv2

6-bit

27.25

57.06

layer1.0.conv3

2-bit

4.02

35.17

layer1.0.conv3

4-bit

8.06

48.50

layer1.0.conv3

6-bit

12.25

59.20

layer1.0.downsample.0

2-bit

4.02

28.24

layer1.0.downsample.0

4-bit

8.06

40.45

layer1.0.downsample.0

6-bit

12.25

52.55

Higher PSNR means lower sensitivity — that bitwidth distorts the layer’s output less.

Recipe generation

We run the greedy recipe generation on the sensitivity table with a target BPW of 4, which assigns all 54 layers and realizes a BPW of 3.95. The bitwidth distribution is:

  • 2 layers at 6-bit (most sensitive): conv1, layer1.0.downsample.0

  • 50 layers at 4-bit

  • 2 layers at 2-bit (least sensitive): layer1.1.conv1, layer3.4.conv2

Results

Comparing against the FP16 baseline and uniform 4-bit:

Configuration

BPW

Size (MB)

Top-1 accuracy

FP16 baseline

16

48.64

75.02%

uniform 4-bit

4

12.16

65.87%

mixed precision (target 4)

3.95

12.03

70.27%

At a slightly lower BPW than uniform 4-bit, mixed precision lifts top-1 accuracy from 65.87% to 70.27% — recovering more than four percentage points of the gap to the FP16 baseline — by spending its bit budget on the layers that are more sensitive.

Accuracy vs BPW graph

We sweep the target BPW from 2 to 6 to trace out the curve below.

Accuracy vs realized BPW for greedy mixed-precision recipes on ResNet50

The inflection sits at around 4.0 realized BPW: below it, every additional 0.5 BPW buys us 15-35 percentage points of accuracy; above it, gains drop to 1-2 points per 0.5 BPW as the curve flattens toward the FP16 baseline.

Summary

At the same model size, mixed-precision palettization significantly narrowed the gap to the FP16 baseline compared to uniform 4-bit palettization.