Actions

The Book of Tomorrow in a Bag of Words

From Algolit

Revision as of 11:20, 18 March 2019 by An (talk | contribs)

by Algolit

The bag-of-words model is a simplifying representation of text used in Natural Language Processing (NLP). In this model, a text is represented as a collection of its unique words, disregarding grammar, punctuation and even word order. The model transforms the text into a unique list of words and how many times they're used in the text, or quite literally a bag of words.

This heavy reduction of language was the big shock when beginning to machine learn. Bag of words is often used as a baseline, on which the new model has to perform better. It can understand the subject of a text by recognizing the most frequent or important words. It is often used to measure the similarities of texts by comparing their bags of words.

For this work the article 'Le Livre de Demain' by engineer G. Vander Haeghen, published in 1907 in the Bulletin de l'Institut International de Bibliographie of the Mundaneum, has been literally reduced to a bag of words. You can buy a bag at the reception of Mundaneum.


Concept & realisation: An Mertens