You shall know a word by the company it keeps
From Algolit
Type: | Algolit extension |
Datasets: | Frankenstein, AnarchFem, WikiHarass, Learning from Deep Learning, Tristes Tropiques |
Technique: | calculating semantic similarity with word-embeddings |
Developed by: | Google Tensorflow's word2vec, Algolit |
You shall know a word by the company it keeps is a series of 5 landscapes that are based on different datasets. Each landscape includes the words 'collective', 'being', 'social' in company of different semantic clusters. The belief that distances in the graph are connected to semantic similarity of words, was one of the basic ideas behind word2vec.
The graphs are the result of a code study based on an existing word-embedding tutorial script word2vec_basic.py. In a machine learning practise, graphs like these function as one of the validation tools to see if a model starts to make sense. It is interesting how this validation process is fuelled by individual semantic understanding of the clusters and the words.
How can we use these semantic landscapes as reading tools?
graph 1: Frankenstein dataset
Includes the book Frankenstein or, The Modern Prometheus by Mary Shelly.
graph 2: Anarch Feminist dataset
Includes 3 books (...)
graph 3: Claude Levi-Strauss dataset
Includes the book Tristes Tropiques by Claude Lévi-Strauss.
graph 4: Deep Learning textbooks dataset
Includes the books (...).
graph 5: Harassing comments dataset
Includes examples of harassment on Talk page comments from Wikipedia.