API Overview¶

Vanilla K-means API¶

import coreai_opt as opt
from coreai_opt.palettization import KMeansPalettizer, KMeansPalettizerConfig
import torch

model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)

# define config
# here we use a config that applies 4-bit per-grouped-channel palettization to all supported layers.
# this can be done by using one of the several available "pre-defined" configs, accessible via the "presets" namespace.
config = KMeansPalettizerConfig.presets.w4()

# palettize weights in the model with the config
palettizer = KMeansPalettizer(model, config)
prepared_model = palettizer.prepare(example_inputs)

# ---------- validate --------------------
# use prepared_model to check accuracy on validation data.
# forward pass will include the effect of weight compression.
val_metric = validate(prepared_model, val_dataset)

# ----------- deployment ------------------
# same as with Quantizer:
# invoke the 'finalize' API to update the PyTorch model and make it compatible for conversion
# with either coreai or coremltools

finalized_model_for_coreai = palettizer.finalize(backend=opt.ExportBackend.CoreAI)
# OR
finalized_model_for_coreml = palettizer.finalize(backend=opt.ExportBackend.CoreML)

Sensitive K-means API¶

Sensitivity-based palettization uses calibration data to compute per-weight importance scores (based on the SqueezeLLM method).

from coreai_opt.palettization import KMeansPalettizer, KMeansPalettizerConfig
import torch.nn.functional as F

model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)

config = KMeansPalettizerConfig()  # defaults to 4 bit palettization for all weights
palettizer = KMeansPalettizer(model, config)
prepared_model = palettizer.prepare(example_inputs)

# compute the clusters/LUTs with weighted-kmeans
# weights of the prepared_model will get updated
with palettizer.calibration_mode(loss_fn=F.cross_entropy) as skm:
    for batch, target in calibration_dataloader:
        output = prepared_model(batch)
        skm.step(output, target)

# ---------- validate --------------------
# use prepared_model to check accuracy on validation data.
# forward pass will include the effect of weight compression.
val_metric = validate(prepared_model, val_dataset)

# ----------- deployment ------------------
# same as before

To save the importance-scores (aka sensitivities) for the weights to reuse later:

# provide path to save weight sensitivities
with palettizer.calibration_mode(
    loss_fn=F.cross_entropy, sensitivity_path="sensitivities.pt"
) as skm:
    for batch, target in calibration_dataloader:
        output = prepared_model(batch)
        skm.step(output, target)

And then in a future new run, load precomputed sensitivities during prepare:

prepared_model = palettizer.prepare(example_inputs, sensitivity_path="sensitivities.pt")

# ---------- validate --------------------
val_metric = validate(prepared_model, val_dataset)

# ----------- deployment ------------------
# same as before

For more details on how to use KMeansPalettizerConfig, ModuleKMeansPalettizerConfig to apply different settings to different weights in the model, see Palettization Config.