turicreate.text_analytics.tf_idf

turicreate.text_analytics.tf_idf(text)

Compute the TF-IDF scores for each word in each document. The collection of documents must be in bag-of-words format.

TF-IDF(w,d)=tf(w,d)log(N/f(w))

where tf(w,d) is the number of times word w appeared in document d, f(w) is the number of documents word w appeared in, N is the number of documents, and we use the natural logarithm.

Parameters:
text : SArray[str | dict | list]

Input text data.

Returns:
out : SArray[dict]

The same document corpus where each score has been replaced by the TF-IDF transformation.

References

Examples

>>> import turicreate

>>> docs = turicreate.SArray('https://static.turi.com/datasets/nips-text')
>>> docs_tfidf = turicreate.text_analytics.tf_idf(docs)