Actions

The GloVe Reader: Difference between revisions

From Algolit

m (Added category)
Line 1: Line 1:
 
[[Category:algoriterary-encounter]]
 
[[Category:algoriterary-encounter]]
  
'''''The GloVe Reader''''' shows one of the pre-trained word datasets that are used for machine learning modelling, such as ''We are a Sentiment Thermometer''. GloVe is an algorithm that looks for co-occurences in large text files. It then creates a semantic map of words, in which similar words come together as little islands. This mapping is packaged as a textfile of 5GB large and 1.917.494 lines of 300 numbers per word.  
+
{|
 +
|-
 +
| Type: || Algolit extension
 +
|-
 +
| Datasets: || [[GloVe]]
 +
|-
 +
| Technique: || calculating semantic similarity with word-embeddings
 +
|-
 +
| Developed by: || Algolit, Google
 +
|}
 +
 
 +
 
 +
'''''The GloVe Reader''''' shows one of the pre-trained word datasets that are used for machine learning modelling, such as ''We are a Sentiment Thermometer''. [[GloVe]] is an algorithm that looks for co-occurences in large text files. It then creates a semantic map of words, in which similar words come together as little islands. This mapping is packaged as a textfile of 5GB large and 1.917.494 lines of 300 numbers per word.  
  
 
The GloVe file is ordered by frequency of words. For the purpose of the exhibition, we rearranged the words in alphabetical order. Even if the Reader would show 60 words per second, it would take 8 hours to vision the entire file. We launch the script at the beginning of the day. The alphabetical order gives you a glance of where the Reader is situated in the file.
 
The GloVe file is ordered by frequency of words. For the purpose of the exhibition, we rearranged the words in alphabetical order. Even if the Reader would show 60 words per second, it would take 8 hours to vision the entire file. We launch the script at the beginning of the day. The alphabetical order gives you a glance of where the Reader is situated in the file.

Revision as of 17:14, 24 October 2017


Type: Algolit extension
Datasets: GloVe
Technique: calculating semantic similarity with word-embeddings
Developed by: Algolit, Google


The GloVe Reader shows one of the pre-trained word datasets that are used for machine learning modelling, such as We are a Sentiment Thermometer. GloVe is an algorithm that looks for co-occurences in large text files. It then creates a semantic map of words, in which similar words come together as little islands. This mapping is packaged as a textfile of 5GB large and 1.917.494 lines of 300 numbers per word.

The GloVe file is ordered by frequency of words. For the purpose of the exhibition, we rearranged the words in alphabetical order. Even if the Reader would show 60 words per second, it would take 8 hours to vision the entire file. We launch the script at the beginning of the day. The alphabetical order gives you a glance of where the Reader is situated in the file.

GloVe was developed in 2014 by Jeffrey Pennington, Richard Socher and Christopher D. Manning, researchers at the Computer Science Department of Stanford University in California.

The GloVe Reader uses 75% of the existing webpages of the Internet. The content scrape was realised by Common Crawl, an NGO based in California. The people of Common Crawl believe the internet should be available to download by anyone.

Download GloVe datasets: https://nlp.stanford.edu/projects/glove/