This module provides utilities for doing text processing.

Note that standard utilities in the text_analytics package can be used for transforming text data into “bag of words” format, where a document is represented as a dictionary mapping unique words with the number of times that word occurs in the document. See count_words() for more details. Also, see pack_columns() and unstack() for ways of creating SArrays containing dictionary types.

We provide methods for learning topic models, which can be useful for modeling large document collections. See create() for more, as well as the text analysis chapter of the User Guide.

topic model

topic_model.create Create a topic model from the given data set.
topic_model.perplexity Compute the perplexity of a set of test documents given a set of predicted topics.
topic_model.TopicModel TopicModel objects can be used to predict the underlying topic of a document.