turicreate.text_analytics.tf_idf

turicreate.text_analytics.tf_idf(text)

Compute the TF-IDF scores for each word in each document. The collection of documents must be in bag-of-words format.

\[\mbox{TF-IDF}(w, d) = tf(w, d) * log(N / f(w))\]

where \(tf(w, d)\) is the number of times word \(w\) appeared in document \(d\), \(f(w)\) is the number of documents word \(w\) appeared in, \(N\) is the number of documents, and we use the natural logarithm.

Parameters:
text : SArray[str | dict | list]

Input text data.

Returns:
out : SArray[dict]

The same document corpus where each score has been replaced by the TF-IDF transformation.

References

Examples

>>> import turicreate

>>> docs = turicreate.SArray('https://static.turi.com/datasets/nips-text')
>>> docs_tfidf = turicreate.text_analytics.tf_idf(docs)