Actions

Difference between revisions of "A Bag of Words"

From Algolit

(Created page with "The bag-of-words model reads a text as a collection of words. While processing a text the model discards word order, punctuation and possibly conjugations. The model transform...")
 
 
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  +
{|
The bag-of-words model reads a text as a collection of words. While processing a text the model discards word order, punctuation and possibly conjugations. The model transforms the text into a unique list of words used in the text, or quite literally a bag of words.
 
  +
|-
  +
| Type: || Algoliterary exploration
  +
|-
  +
| Technique: || Frequency counts
  +
|-
  +
| Developed by: || Python, nltk, Algolit
  +
|}
   
  +
This interactive installation guides you through the different steps in the process of a bag-of-words model.
This model is often used to understand the subject of a text by recognizing the important words, or to measure the similarities of texts by comparing their bags of words.
 
   
 
The bag-of-words model is a classification model which reads a text as a collection of words. While processing a text the model discards word order, punctuation and possibly conjugations. The model transforms the text into a unique list of words used in the text, or quite literally a bag of words.
To make the model more informative the occurrences or frequency of words are counted, to be able to compare texts of various lengths this absolute count can be made relative.
 
   
 
This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words.
To understand the importance of a word in a single text relative to the importance of the word in a collection of texts the TF-IDF can be used, where the frequency of a word in a single text is divided by the average frequency of the word in the collection.
 
  +
 
To understand the importance of less common but significant words, often related to the topic of the text, the function TF-IDF (Term Frequency-Inverted Document Frequency) can be used, where the frequency of a word in a single text is divided by the average frequency of the word in the collection.
  +
  +
[[Category:Algoliterary-Encounters]]
  +
[[Category:algolit-extension]]

Latest revision as of 14:10, 30 October 2017

Type: Algoliterary exploration
Technique: Frequency counts
Developed by: Python, nltk, Algolit

This interactive installation guides you through the different steps in the process of a bag-of-words model.

The bag-of-words model is a classification model which reads a text as a collection of words. While processing a text the model discards word order, punctuation and possibly conjugations. The model transforms the text into a unique list of words used in the text, or quite literally a bag of words.

This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words.

To understand the importance of less common but significant words, often related to the topic of the text, the function TF-IDF (Term Frequency-Inverted Document Frequency) can be used, where the frequency of a word in a single text is divided by the average frequency of the word in the collection.