graph_analytics
¶
The graph analytics toolkit contains methods for analyzing a
SGraph
. Each method takes an input graph and returns a model
object, which contains the training time, an SFrame
with the
desired output for each vertex, and a new graph whose vertices contain the
output as an attribute. Note that all of the methods in the graph analytics
toolkit are available from the top level turicreate import.
For this example we download a dataset of James Bond characters to an SFrame, then build the SGraph.
>>> from turicreate import SFrame, SGraph
>>> url = 'https://static.turi.com/datasets/bond/bond_edges.csv'
>>> data = SFrame.read_csv(url)
>>> g = SGraph().add_edges(data, src_field='src', dst_field='dst')
The degree_counting.create()
method computes the inbound, outbound,
and total degree for each vertex.
>>> from turicreate import degree_counting
>>> deg = degree_counting.create(g)
>>> deg_graph = deg['graph'] # a new SGraph with degree data attached to each vertex
>>> in_degree = deg_graph.vertices[['__id', 'in_degree']]
>>> out_degree = deg_graph.vertices[['__id', 'out_degree']]
The connected_components.create()
method finds all weakly connected
components in the graph and returns a
ConnectedComponentsModel
.
A connected component is a group of vertices such that
there is a path between each vertex in the component and all other vertices in
the group. If two vertices are in different connected components there is no
path between them.
>>> from turicreate import connected_components
>>> cc = connected_components.create(g)
>>> print cc.summary()
>>> cc_ids = cc.get('component_id') # an SFrame
>>> cc_ids = cc['component_id'] # equivalent to the above line
>>> cc_graph = cc['graph']
The shortest_path.create()
method finds the shortest directed path distance from a single source vertex to
all other vertices and returns a
ShortestPathModel
.
The output model contains this information and a method to
retrieve the actual path to a particular vertex.
>>> from turicreate import shortest_path
>>> sp = shortest_path.create(g, source_vid=123)
>>> sp_sframe = sp['distance']
>>> sp_graph = sp['graph']
>>> path = sp.get_path('98')
The triangle_counting.create()
counts the number of triangles in the graph and for each vertex and returns
a TriangleCountingModel
.
A graph triangle is a complete subgraph of three vertices. The number of
triangles to which a vertex belongs is an indication of the connectivity of the
graph around that vertex.
>>> from turicreate import triangle_counting
>>> tc = triangle_counting.create(g)
>>> tc_out = tc['triangle_count']
The pagerank.create()
method computes
the pagerank for each vertex and returns a PagerankModel
.
The pagerank value indicates the centrality of each node in the graph.
>>> from turicreate import pagerank
>>> pr = pagerank.create(g)
>>> pr_out = pr['pagerank']
The label_propagation.create()
method computes the label probability for the vertices with unobserved labels
by propagating information from the vertices with observed labels along the edges.
The method returns a LabelPropagationModel
,
which contains the probability that a vertex belongs to each of the label classes.
>>> from turicreate import label_propagation
>>> import random
>>> def init_label(vid):
... x = random.random()
... if x > 0.9:
... return 0
... elif x < 0.1:
... return 1
... else:
... return None
>>> g.vertices['labels'] = g.vertices['__id'].apply(init_label, int)
>>> m = label_propagation.create(g)
>>> labels = m['labels']
The K-core decomposition recursively removes vertices from the graph with degree
less than k. The value of k where a vertex is removed is its core ID; the
kcore.create()
method returns
a KcoreModel
which contains the core ID for
every vertex in the graph.
>>> from turicreate import kcore
>>> kc = kcore.create(g)
>>> kcore_id = kc['core_id']
Graph coloring assigns each vertex in the graph to a group in such a way that no
two adjacent vertices share the same group.
graph_coloring.create()
method returns
a GraphColoringModel
which contains
the color group ID for every vertex in the graph.
>>> from turicreate import graph_coloring
>>> color = graph_coloring.create(g)
>>> color_id = color['color_id']
>>> num_colors = color['num_colors']
For more information on the models in the graph analytics toolkit, plus extended examples, please see the model definitions and create methods in the API documentation, as well as the graph analytics chapter of the User Guide.
connected components¶
connected_components.create |
Compute the number of weakly connected components in the graph. |
connected_components.ConnectedComponentsModel |
A ConnectedComponentsModel object contains the component ID for each vertex and the total number of weakly connected components in the graph. |
degree counting¶
degree_counting.create |
Compute the in degree, out degree and total degree of each vertex. |
degree_counting.DegreeCountingModel |
Model object containing the in degree, out degree and total degree for each vertex, |
graph coloring¶
graph_coloring.create |
Compute the graph coloring. |
graph_coloring.GraphColoringModel |
A GraphColoringModel object contains color ID assignments for each vertex and the total number of colors used in coloring the entire graph. |
k-core¶
kcore.create |
Compute the K-core decomposition of the graph. |
kcore.KcoreModel |
A KcoreModel object contains a core ID for each vertex, and the total number of cores in the graph. |
label propagation¶
label_propagation.create |
Given a weighted graph with observed class labels of a subset of vertices, infer the label probability for the unobserved vertices using the “label propagation” algorithm. |
label_propagation.LabelPropagationModel |
A LabelPropagationModel computes the probability of each class label for each unlabeled vertex. |
pagerank¶
pagerank.create |
Compute the PageRank for each vertex in the graph. |
pagerank.PagerankModel |
A PageRankModel object contains the pagerank value for each vertex. |
shortest path¶
shortest_path.create |
Compute the single source shortest path distance from the source vertex to all vertices in the graph. |
shortest_path.ShortestPathModel |
Model object containing the distance for each vertex in the graph to a single source vertex, which is specified during turicreate.shortest_path.create() . |
triangle counting¶
triangle_counting.create |
Compute the number of triangles each vertex belongs to, ignoring edge directions. |
triangle_counting.TriangleCountingModel |
Model object containing the triangle count for each vertex, and the total number of triangles. |