Actions

TF-IDF: Difference between revisions

From Algolit

 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
by Algolit
 
by Algolit
  
The TF-IDF (Term Frequency-Inverse Document Frequency) is a weighting method used in text search. This statistical measure makes it possible to evaluate the importance of a term contained in a document, relative to a collection or corpus. The weight increases in proportion to the number of occurrences of the word in the document. It also varies according to the frequency of the word in the corpus. The TF-IDF is used in particular in the classification of spam in email softwares.
+
[https://gitlab.constantvzw.org/algolit/mundaneum/tree/master/exhibition/5-Readers/tf-idf Sources on Gitlab]
  
A web based-interface shows this algorithm through animations allowing to understand the different steps of text classification. How does a TF-IDF-based program read a text? How does it transform words into numbers?
+
The TF-IDF (Term Frequency-Inverse Document Frequency) is a weighting method used in text search. This statistical measure makes it possible to evaluate the importance of a term contained in a document, relative to a collection or corpus of documents. The weight increases in proportion to the number of occurrences of the word in the document. It also varies according to the frequency of the word in the corpus. The TF-IDF is used in particular in the classification of spam in email softwares.
 +
 
 +
A web-based interface shows this algorithm through animations making it possible to understand the different steps of text classification. How does a TF-IDF-based programme read a text? How does it transform words into numbers?
  
 
----------------------------------
 
----------------------------------

Latest revision as of 18:01, 4 June 2019

by Algolit

Sources on Gitlab

The TF-IDF (Term Frequency-Inverse Document Frequency) is a weighting method used in text search. This statistical measure makes it possible to evaluate the importance of a term contained in a document, relative to a collection or corpus of documents. The weight increases in proportion to the number of occurrences of the word in the document. It also varies according to the frequency of the word in the corpus. The TF-IDF is used in particular in the classification of spam in email softwares.

A web-based interface shows this algorithm through animations making it possible to understand the different steps of text classification. How does a TF-IDF-based programme read a text? How does it transform words into numbers?


Concept, code, animation: Sarah Garcin