turicreate.evaluation.fbeta_score

turicreate.evaluation.fbeta_score(targets, predictions, beta=1.0, average='macro')

Compute the F-beta score. The F-beta score is the weighted harmonic mean of precision and recall. The score lies in the range [0,1] with 1 being ideal and 0 being the worst.

The beta value is the weight given to precision vs recall in the combined score. beta=0 considers only precision, as beta increases, more weight is given to recall with beta > 1 favoring recall over precision.

The F-beta score is defined as:

\[f_{\beta} = (1 + \beta^2) \times \frac{(p \times r)}{(\beta^2 p + r)}\]

Where \(p\) is the precision and \(r\) is the recall.

Parameters:
targets : SArray

An SArray of ground truth class labels. Can be of any type except float.

predictions : SArray

The prediction that corresponds to each target value. This SArray must have the same length as targets and must be of the same type as the targets SArray.

beta: float

Weight of the precision term in the harmonic mean.

average : string, [None, ‘macro’ (default), ‘micro’]

Metric averaging strategies for multiclass classification. Averaging strategies can be one of the following:

  • None: No averaging is performed and a single metric is returned for each class.
  • ‘micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives.
  • ‘macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

For a more precise definition of micro and macro averaging refer to [1] below.

Returns:
out : float (for binary classification) or dict[float] (for multi-class, average=None)

Score for the positive class (for binary classification) or an average score for each class for multi-class classification. If average=None, then a dictionary is returned where the key is the class label and the value is the score for the corresponding class label.

Notes

  • For binary classification, if the target label is of type “string”, then the labels are sorted alphanumerically and the largest label is chosen as the “positive” label. For example, if the classifier labels are {“cat”, “dog”}, then “dog” is chosen as the positive label for the binary classification case.

References

  • [1] Sokolova, Marina, and Guy Lapalme. “A systematic analysis of performance measures for classification tasks.” Information Processing & Management 45.4 (2009): 427-437.

Examples

# Targets and Predictions
>>> targets = turicreate.SArray([0, 1, 2, 3, 0, 1, 2, 3])
>>> predictions = turicreate.SArray([1, 0, 2, 1, 3, 1, 0, 1])

# Micro average of the F-Beta score
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = 'micro')
0.25

# Macro average of the F-Beta score
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = 'macro')
0.24305555555555558

# F-Beta score for each class.
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = None)
{0: 0.0, 1: 0.4166666666666667, 2: 0.5555555555555556, 3: 0.0}

This metric also works when the targets are of type str

# Targets and Predictions
>>> targets = turicreate.SArray(
...      ["cat", "dog", "foosa", "snake", "cat", "dog", "foosa", "snake"])
>>> predictions = turicreate.SArray(
...      ["dog", "cat", "foosa", "dog", "snake", "dog", "cat", "dog"])

# Micro average of the F-Beta score
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = 'micro')
0.25

# Macro average of the F-Beta score
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = 'macro')
0.24305555555555558

# F-Beta score for each class.
>>> turicreate.evaluation.fbeta_score(targets, predictions,
...                                 beta=2.0, average = None)
{'cat': 0.0, 'dog': 0.4166666666666667, 'foosa': 0.5555555555555556, 'snake': 0.0}