metrics package

Submodules

metrics.average_precision module

class metrics.average_precision.AveragePrecisionMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None, force_cpu: bool = True)[source]

Bases: EpochMetric

compute_with_aggregates(y_pred: Tensor, y_true: Tensor) → Number | Dict[str, Number][source]

Computes the metrics given aggregated predictions and targets.

It gets called by self.compute. This happens at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

metrics.coco_map module

class metrics.coco_map.COCOEvaluator(opts, split: str | None = 'val', year: int | None = 2017, is_distributed: bool | None = False)[source]

Bases: BaseMetric

__init__(opts, split: str | None = 'val', year: int | None = 2017, is_distributed: bool | None = False)[source]

reset() → None[source]: Resets all aggregated data. Called at the start of every epoch.

update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any] = {}, batch_size: int | None = 1)[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

prepare_cache_results(detection_results: List[DetectionPredTuple], image_ids, image_widths, image_heights) → None[source]

summarize_coco_results() → Dict[source]

compute() → Dict[str, float][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

metrics.confusion_mat module

class metrics.confusion_mat.ConfusionMatrix(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: BaseMetric

Computes the confusion matrix and is based on FCN

reset()[source]: Resets all aggregated data. Called at the start of every epoch.

update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any] = {}, batch_size: int | None = 1)[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

compute() → Number | Dict[str, Number | List[Number]][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

metrics.image_text_retrieval module

class metrics.image_text_retrieval.ImageTextRetrievalMetric(image: str = 'image', text: str = 'text', opts: Dict[str, Any] | None = None, is_distributed: bool = False)[source]

Bases: BaseMetric

Computes the image-text retrieval metrics for a list of images and their captions using the distance between their embeddings.

Expects predictions to contain two keys:

image (Tensor): [batch, hidden_dim] text (Tensor): [batch * num_captions, hidden_dim]

Computes the following metrics:

image2text: recall@1, recall@5, recall@10, mean_rank, median_rank
text2image: recall@1, recall@5, recall@10, mean_rank, median_rank

NOTE: each image MUST have the same number of captions.

__init__(image: str = 'image', text: str = 'text', opts: Dict[str, Any] | None = None, is_distributed: bool = False) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add metric specific arguments

reset() → None[source]: Resets all aggregated data. Called at the start of every epoch.

update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any], batch_size: int = 1) → None[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

get_aggregates() → Tuple[Tensor, Tensor][source]

compute() → Number | Dict[str, Number][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

metrics.intersection_over_union module

metrics.intersection_over_union.compute_miou_batch(prediction: Tuple[Tensor, Tensor] | Tensor, target: Tensor, epsilon: float | None = 1e-07)[source]

class metrics.intersection_over_union.IOUMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]: This function gathers intersection and union metrics from different processes and converts to float.

compute() → Number | Dict[str, Number][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

metrics.metric_base module

class metrics.metric_base.BaseMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: ABC

__init__(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add metric specific arguments

abstract reset() → None[source]: Resets all aggregated data. Called at the start of every epoch.

abstract update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any], batch_size: int | None = 1) → None[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

abstract compute() → Number | Dict[str, Number][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

preprocess_predictions(prediction: Tensor | Dict) → Tensor | Dict[source]

preprocess_targets(target: Tensor | Dict) → Tensor | Dict[source]

class metrics.metric_base.AverageMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: BaseMetric

reset()[source]: Resets all aggregated data. Called at the start of every epoch.

abstract gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]

update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any] | None = {}, batch_size: int | None = 1) → None[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

compute() → Number | Dict[str, Number][source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

class metrics.metric_base.EpochMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None, force_cpu: bool = True)[source]

Bases: BaseMetric

__init__(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None, force_cpu: bool = True)[source]

reset()[source]: Resets all aggregated data. Called at the start of every epoch.

update(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any] | None = None, batch_size: int | None = 1) → None[source]

Processes a new batch of predictions and targets for computing the metric.

Parameters:

predictions – model outputs for the current batch
target – labels for the current batch
extras – dict containing extra information. During training this includes “loss” and “grad_norm” keys. During validaiton only includes “loss”.
batch_size – optionally used to correctly compute the averages when the batch size varies across batches.

get_aggregates() → Tuple[Tensor, Tensor][source]

Aggregates predictions and targets.

This function gets called every time self.compute is called, which is at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

compute_with_aggregates(predictions: Tensor, targets: Tensor)[source]

Computes the metrics given aggregated predictions and targets.

It gets called by self.compute. This happens at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

compute()[source]

Computes the metrics with the existing data.

It gets called at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

Returns:: Depending on the metric, can return a scalar metric or a dictionary of metrics. Lists (or dicts of lists) are also generally accepted but not encouraged.

metrics.metric_base_test module

class metrics.metric_base_test.DummyMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]

metrics.metric_base_test.test_average_metric_distributed_batchsize(mocker)[source]

metrics.misc module

class metrics.misc.LossMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]: This function gather losses from different processes and converts to float.

class metrics.misc.GradNormMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]

metrics.probability_histograms module

class metrics.probability_histograms.ProbabilityHistogramMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: EpochMetric

__init__(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add metric specific arguments

compute_with_aggregates(y_pred: Tensor, y_true: Tensor) → Number | Dict[str, Number][source]

Computes the metrics given aggregated predictions and targets.

It gets called by self.compute. This happens at every log iteration as well as the end of each epoch, e.g. train, val, valEMA. Logging happens at iteration 1 and every common.log_freq thereafter.

Note: for computationally heavy metrics, you may want to increase common.log_freq.

metrics.psnr module

metrics.psnr.compute_psnr(prediction: Tensor, target: Tensor, no_uint8_conversion: bool | None = False) → Tensor[source]

class metrics.psnr.PSNRMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]: This function gathers psnr scores from different processes and converts to float.

metrics.retrieval_cmc module

metrics.retrieval_cmc.cosine_distance_matrix(x: Tensor, y: Tensor) → Tensor[source]

Get pair-wise cosine distances.

Parameters:

x – A feature tensor with shape (n, d).
y – A feature tensor with shape (m, d).

Returns: Distance tensor between features x and y with shape (n, m).

metrics.retrieval_cmc.l2_distance_matrix(x: Tensor, y: Tensor) → Tensor[source]

Get pair-wise l2 distances.

Parameters:

x – A torch feature tensor with shape (n, d).
y – A torch feature tensor with shape (m, d).

Returns: Distance tensor between features x and y with shape (n, m).

class metrics.retrieval_cmc.RetrievalCMC(opts: Namespace | None = None, is_distributed: bool = False, pred: str = 'embedding', target: str | None = None, compute_map: bool = True)[source]

Bases: EpochMetric

Compute CMC-top-k and mAP metrics in retrieval setup.

__init__(opts: Namespace | None = None, is_distributed: bool = False, pred: str = 'embedding', target: str | None = None, compute_map: bool = True) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add metric specific arguments

compute_with_aggregates(embedding: Tensor, labels: Tensor) → Dict[str, float][source]

Compute retrieval metrics over full epoch.

Parameters:

embedding – tensor of m embeddings with shape (m, d), where d is embedding dimension.
labels – tensor of m labels.

Returns: A dictionary of top1, top-{k} and mAP.

metrics.retrieval_cmc.cmc_calculation(distance_matrix: Tensor, query_ids: Tensor, k: int = 5) → Tuple[float, float][source]

Compute Cumulative Matching Characteristics metric.

Parameters:

distance_matrix – pairwise distance matrix between embeddings of gallery and query sets
query_ids – labels for the query data (assuming the same as gallery)
k – parameter for top k retrieval

Returns: cmc-top1, cmc-top5

metrics.retrieval_cmc.mean_ap(distance_matrix: Tensor, labels: Tensor) → float[source]

Compute Mean Average Precision.

Parameters:

distance_matrix – pairwise distance matrix between embeddings of gallery and query sets, shape = (m,m)
labels – labels for the query data (assuming the same as gallery), shape = (m,)

Returns: mean average precision (float)

metrics.stats module

class metrics.stats.Statistics(opts: Namespace, metric_names: list = ['loss'], is_master_node: bool | None = False, is_distributed: bool | None = False, log_writers: List | None = [])[source]

Bases: object

__init__(opts: Namespace, metric_names: list = ['loss'], is_master_node: bool | None = False, is_distributed: bool | None = False, log_writers: List | None = []) → None[source]

Updates all the metrics after a batch.

Parameters:

pred_label – predictions coming from a model (must be a Tensor or a Dict of Tensors)
target_label – GT labels (Tensor or a Dict of Tensors)
extras – Optional Dict containing extra info, usually Loss and GradNorm e.g. {“loss”: loss_value, “grad_norm”: gradient_norm}
batch_time – Optional time it took to run through the batch
n – batch size (to be used in averaging the numbers correctly)

avg_statistics(metric_name: str, sub_metric_name: str | None = None, *args, **kwargs) → float[source]

This function computes the average statistics of a given metric.

The statistics are stored in form of a dictionary and each key-value pair can be of string and number OR string and dictionary of string and number.

Examples

{‘loss’: 10.0, ‘top-1’: 50.0} {‘loss’: {‘total_loss’: 10.0, ‘cls_loss’: 2.0, ‘reg_loss’: 8.0}, ‘mAP’: 5.0}

iter_summary(epoch: int, n_processed_samples: int, total_samples: int, elapsed_time: float, learning_rate: float) → None[source]

epoch_summary(epoch: int, stage: str | None = 'Training') → None[source]

metrics.topk_accuracy module

metrics.topk_accuracy.top_k_accuracy(output: Tensor, target: Tensor, top_k: tuple | None = (1,)) → list[source]

class metrics.topk_accuracy.TopKMetric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: AverageMetric

K = 1

gather_metrics(prediction: Tensor | Dict, target: Tensor | Dict, extras: Dict[str, Any]) → Tensor | Dict[str, Tensor][source]: This function gather top-k metrics from different processes and converts to float.

class metrics.topk_accuracy.Top1Metric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: TopKMetric

K = 1

class metrics.topk_accuracy.Top5Metric(opts: Namespace | None = None, is_distributed: bool = False, pred: str | None = None, target: str | None = None)[source]

Bases: TopKMetric

K = 5

Module contents

metrics.arguments_stats(parser: ArgumentParser)[source]