turicreate.evaluation.fbeta_score¶

turicreate.evaluation.
fbeta_score
(targets, predictions, beta=1.0, average='macro')¶ Compute the Fbeta score. The Fbeta score is the weighted harmonic mean of precision and recall. The score lies in the range [0,1] with 1 being ideal and 0 being the worst.
The beta value is the weight given to precision vs recall in the combined score. beta=0 considers only precision, as beta increases, more weight is given to recall with beta > 1 favoring recall over precision.
The Fbeta score is defined as:
\[f_{\beta} = (1 + \beta^2) \times \frac{(p \times r)}{(\beta^2 p + r)}\]Where \(p\) is the precision and \(r\) is the recall.
Parameters:  targets : SArray
An SArray of ground truth class labels. Can be of any type except float.
 predictions : SArray
The prediction that corresponds to each target value. This SArray must have the same length as
targets
and must be of the same type as thetargets
SArray. beta: float
Weight of the precision term in the harmonic mean.
 average : string, [None, ‘macro’ (default), ‘micro’]
Metric averaging strategies for multiclass classification. Averaging strategies can be one of the following:
 None: No averaging is performed and a single metric is returned for each class.
 ‘micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives.
 ‘macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
For a more precise definition of micro and macro averaging refer to [1] below.
Returns:  out : float (for binary classification) or dict[float] (for multiclass, average=None)
Score for the positive class (for binary classification) or an average score for each class for multiclass classification. If average=None, then a dictionary is returned where the key is the class label and the value is the score for the corresponding class label.
See also
Notes
 For binary classification, if the target label is of type “string”, then the labels are sorted alphanumerically and the largest label is chosen as the “positive” label. For example, if the classifier labels are {“cat”, “dog”}, then “dog” is chosen as the positive label for the binary classification case.
References
 [1] Sokolova, Marina, and Guy Lapalme. “A systematic analysis of performance measures for classification tasks.” Information Processing & Management 45.4 (2009): 427437.
Examples
# Targets and Predictions >>> targets = turicreate.SArray([0, 1, 2, 3, 0, 1, 2, 3]) >>> predictions = turicreate.SArray([1, 0, 2, 1, 3, 1, 0, 1]) # Micro average of the FBeta score >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = 'micro') 0.25 # Macro average of the FBeta score >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = 'macro') 0.24305555555555558 # FBeta score for each class. >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = None) {0: 0.0, 1: 0.4166666666666667, 2: 0.5555555555555556, 3: 0.0}
This metric also works when the targets are of type str
# Targets and Predictions >>> targets = turicreate.SArray( ... ["cat", "dog", "foosa", "snake", "cat", "dog", "foosa", "snake"]) >>> predictions = turicreate.SArray( ... ["dog", "cat", "foosa", "dog", "snake", "dog", "cat", "dog"]) # Micro average of the FBeta score >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = 'micro') 0.25 # Macro average of the FBeta score >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = 'macro') 0.24305555555555558 # FBeta score for each class. >>> turicreate.evaluation.fbeta_score(targets, predictions, ... beta=2.0, average = None) {'cat': 0.0, 'dog': 0.4166666666666667, 'foosa': 0.5555555555555556, 'snake': 0.0}