Actions

You shall know a word by the company it keeps

From Algolit

Revision as of 13:34, 25 October 2017 by Cristina (talk | contribs)


Type: Algolit extension
Datasets: Frankenstein, AnarchFem, WikiHarass, Learning from Deep Learning, Tristes Tropiques
Technique: calculating semantic similarity with word-embeddings
Collectively developed by: The people behind Google Tensorflow's word2vec, Algolit

You shall know a work by the company it keeps is a serie of 5 landscapes that are based on different datasets. Each landscape includes the words 'collective', 'being', 'social' in company of different semantic clusters. The belief that distances in the graph are connected to semantic similarity of words, was one of the basic ideas behind word2vec.

The graphs are the product of a code study to an existing word-embedding tutorial script word2vec_basic.py. In a machine learning practise, graphs like these function as one of the validation tools to see if a model starts to make sense. It is interesting how this validation process is fueled by individual semantic understanding of the clusters and the words.

How are these semantic landscapes reading tools?

graph 1: Frankenstein dataset

Includes the book Frankenstein.

Error creating thumbnail: Unable to save thumbnail to destination

graph 2: Anarch Feminist dataset

Includes 3 books (...)

Error creating thumbnail: Unable to save thumbnail to destination

graph 3: Claude Levi-Strauss dataset

Includes the book Tristes Tropiques.

Error creating thumbnail: Unable to save thumbnail to destination

graph 4: Deep Learning textbooks dataset

Includes the books (...).

Error creating thumbnail: Unable to save thumbnail to destination

graph 5: Harassing comments dataset

Includes examples of harassment on Talk page comments from Wikipedia.

Error creating thumbnail: Unable to save thumbnail to destination