You shall know a word by the company it keeps
From Algolit
Type: | Algolit extension |
Datasets: | Frankenstein, AnarchFem, WikiHarass, Learning from Deep Learning, Tristes Tropiques |
Technique: | calculating semantic similarity with word-embeddings |
Collectively developed by: | The people behind Google Tensorflow's word2vec, Algolit |
You shall know a work by the company it keeps is a serie of 5 landscapes that are based on different datasets. Each landscape includes the words 'collective', 'being', 'social' in company of different semantic clusters. The belief that distances in the graph are connected to semantic similarity of words, was one of the basic ideas behind word2vec.
The graphs are the product of a code study to an existing word-embedding tutorial script word2vec_basic.py. In a machine learning practise, graphs like these function as one of the validation tools to see if a model starts to make sense. It is interesting how this validation process is fueled by individual semantic understanding of the clusters and the words.
How are these semantic landscapes reading tools?
graph 1: Frankenstein dataset
Includes the book Frankenstein.
graph 2: Anarch Feminist dataset
Includes 3 books (...)
graph 3: Claude Levi-Strauss dataset
Includes the book Tristes Tropiques.
graph 4: Deep Learning textbooks dataset
Includes the books (...).
graph 5: Harassing comments dataset
Includes examples of harassment on Talk page comments from Wikipedia.