# turicreate.text_analytics.tf_idf¶

turicreate.text_analytics.tf_idf(text)

Compute the TF-IDF scores for each word in each document. The collection of documents must be in bag-of-words format.

$\mbox{TF-IDF}(w, d) = tf(w, d) * log(N / f(w))$

where $$tf(w, d)$$ is the number of times word $$w$$ appeared in document $$d$$, $$f(w)$$ is the number of documents word $$w$$ appeared in, $$N$$ is the number of documents, and we use the natural logarithm.

Parameters: text : SArray[str | dict | list] Input text data. out : SArray[dict] The same document corpus where each score has been replaced by the TF-IDF transformation.

References

Examples

>>> import turicreate

>>> docs = turicreate.SArray('https://static.turi.com/datasets/nips-text')
>>> docs_tfidf = turicreate.text_analytics.tf_idf(docs)