turicreate.label_propagation.create

turicreate.label_propagation.create(graph, label_field, threshold=0.001, weight_field='', self_weight=1.0, undirected=False, max_iterations=None, _single_precision=False, _distributed='auto', verbose=True)

Given a weighted graph with observed class labels of a subset of vertices, infer the label probability for the unobserved vertices using the “label propagation” algorithm.

The algorithm iteratively updates the label probability of current vertex as a weighted sum of label probability of self and the neighboring vertices until converge. See turicreate.label_propagation.LabelPropagationModel for the details of the algorithm.

Notes: label propagation works well with small number of labels, i.e. binary labels, or less than 1000 classes. The toolkit will throw error if the number of classes exceeds the maximum value (1000).

Parameters:
graph : SGraph

The graph on which to compute the label propagation.

label_field: str

Vertex field storing the initial vertex labels. The values in must be [0, num_classes). None values indicate unobserved vertex labels.

threshold : float, optional

Threshold for convergence, measured in the average L2 norm (the sum of squared values) of the delta of each vertex’s label probability vector.

max_iterations: int, optional

The max number of iterations to run. Default is unlimited. If set, the algorithm terminates when either max_iterations or convergence threshold is reached.

weight_field: str, optional

Vertex field for edge weight. If empty, all edges are assumed to have unit weight.

self_weight: float, optional

The weight for self edge.

undirected: bool, optional

If true, treat each edge as undirected, and propagates label in both directions.

_single_precision : bool, optional

If true, running label propagation in single precision. The resulting probability values may less accurate, but should run faster and use less memory.

_distributed : distributed environment, internal
verbose : bool, optional

If True, print progress updates.

Returns:
out : LabelPropagationModel

References

Examples

If given an SGraph g, we can create a LabelPropagationModel as follows:

>>> g = turicreate.load_sgraph('http://snap.stanford.edu/data/email-Enron.txt.gz',
...                         format='snap')
# Initialize random classes for a subset of vertices
# Leave the unobserved vertices with None label.
>>> import random
>>> def init_label(vid):
...     x = random.random()
...     if x < 0.2:
...         return 0
...     elif x > 0.9:
...         return 1
...     else:
...         return None
>>> g.vertices['label'] = g.vertices['__id'].apply(init_label, int)
>>> m = turicreate.label_propagation.create(g, label_field='label')

We can obtain for each vertex the predicted label and the probability of each label in the graph g using:

>>> labels = m['labels']     # SFrame
>>> labels
+------+-------+-----------------+-------------------+----------------+
| __id | label | predicted_label |         P0        |       P1       |
+------+-------+-----------------+-------------------+----------------+
|  5   |   1   |        1        |        0.0        |      1.0       |
|  7   |  None |        0        |    0.8213214997   |  0.1786785003  |
|  8   |  None |        1        | 5.96046447754e-08 | 0.999999940395 |
|  10  |  None |        0        |   0.534984718273  | 0.465015281727 |
|  27  |  None |        0        |   0.752801638549  | 0.247198361451 |
|  29  |  None |        1        | 5.96046447754e-08 | 0.999999940395 |
|  33  |  None |        1        | 5.96046447754e-08 | 0.999999940395 |
|  47  |   0   |        0        |        1.0        |      0.0       |
|  50  |  None |        0        |   0.788279032657  | 0.211720967343 |
|  52  |  None |        0        |   0.666666666667  | 0.333333333333 |
+------+-------+-----------------+-------------------+----------------+
[36692 rows x 5 columns]