Actions

Difference between revisions of "A One Hot Vector"

From Algolit

Line 10: Line 10:
 
=one-hot-vectors=
 
=one-hot-vectors=
  
 +
<br>
 
"''Meaning is this illusive thing that were trying to capture''" (Richard Socher in [https://www.youtube.com/watch?v=xhHOL3TNyJs&index=2&list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi CS224D Lecture 2 - 31st Mar 2016 (Youtube)])
 
"''Meaning is this illusive thing that were trying to capture''" (Richard Socher in [https://www.youtube.com/watch?v=xhHOL3TNyJs&index=2&list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi CS224D Lecture 2 - 31st Mar 2016 (Youtube)])
 +
<br>
  
 
If we work with the example sentence ...
 
If we work with the example sentence ...
  
 +
<br>
 
  "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
 
  "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
 +
<br>
  
 
... these are the 14 words we work with ...
 
... these are the 14 words we work with ...
  
 +
<br>
 
  a
 
  a
 
  algoliterary
 
  algoliterary
Line 32: Line 37:
 
  words
 
  words
 
  .
 
  .
 
+
<br>
  
 
... a single vector in a one-hot-vector looks like this ...
 
... a single vector in a one-hot-vector looks like this ...
  
 +
<br>
 
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  
 
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  
 +
<br>
  
 
... and a full fourteen-dimensional matrix like this ...
 
... and a full fourteen-dimensional matrix like this ...
  
 +
<br>
 
  [[0 0 0 0 0 0 0 0 0 0 0 0 0 0]  a
 
  [[0 0 0 0 0 0 0 0 0 0 0 0 0 0]  a
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  algoliterary
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  algoliterary
Line 54: Line 62:
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  words
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  words
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]] .
 
   [0 0 0 0 0 0 0 0 0 0 0 0 0 0]] .
 +
<br>
  
 
... with one 0 for each unique word in a vocabulary, and a row for each unique word.  
 
... with one 0 for each unique word in a vocabulary, and a row for each unique word.  
Line 59: Line 68:
 
The following step is to count how often a word appears next to another ...
 
The following step is to count how often a word appears next to another ...
  
 +
<br>
 
  "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
 
  "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
 +
<br>
  
 +
<br>
 
  [[0 0 0 1 0 0 0 0 1 0 0 0 0 0]  a
 
  [[0 0 0 1 0 0 0 0 1 0 0 0 0 0]  a
 
   [0 0 0 0 0 1 0 0 0 0 0 1 0 0]  algoliterary
 
   [0 0 0 0 0 1 0 0 0 0 0 1 0 0]  algoliterary
Line 75: Line 87:
 
   [0 0 0 0 1 0 0 0 0 0 1 0 0 0]  words
 
   [0 0 0 0 1 0 0 0 0 0 1 0 0 0]  words
 
   [0 0 0 0 0 0 0 0 0 1 0 0 0 0]] .
 
   [0 0 0 0 0 0 0 0 0 1 0 0 0 0]] .
 +
<br>
 +
 +
==Algolit one-hot-vector scripts==
 +
 +
Two one-hot-vector scripts were created during one of the Algolit sessions, both creating the same matrix but in a different way. To download and run them, use the following links: [https://gitlab.constantvzw.org/algolit/algolit/blob/master/algoliterary_encounter/one-hot-vector/one-hot-vector_gijs.py one-hot-vector_gijs.py] & [https://gitlab.constantvzw.org/algolit/algolit/blob/master/algoliterary_encounter/one-hot-vector/one-hot-vector_hans.py one-hot-vector_hans.py]
  
 
=Note that=
 
=Note that=

Revision as of 15:16, 25 October 2017

Type: Algoliterary exploration
Technique: word-embeddings
Developed by: Algolit

one-hot-vectors


"Meaning is this illusive thing that were trying to capture" (Richard Socher in CS224D Lecture 2 - 31st Mar 2016 (Youtube))

If we work with the example sentence ...


"The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."


... these are the 14 words we work with ...


a
algoliterary
as
discovered
disguised
explores
landscape
made
multidimensional
numbers
of
the
words
.


... a single vector in a one-hot-vector looks like this ...


[0 0 0 0 0 0 0 0 0 0 0 0 0 0] 


... and a full fourteen-dimensional matrix like this ...


[[0 0 0 0 0 0 0 0 0 0 0 0 0 0]  a
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  algoliterary
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  as
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  discovered
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  disguised
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  explores
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  landscape
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  made
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  multidimensional
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  numbers
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  of
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  the
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]  words
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]] .


... with one 0 for each unique word in a vocabulary, and a row for each unique word.

The following step is to count how often a word appears next to another ...


"The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."



[[0 0 0 1 0 0 0 0 1 0 0 0 0 0]  a
 [0 0 0 0 0 1 0 0 0 0 0 1 0 0]  algoliterary
 [0 0 0 0 1 0 0 0 0 1 0 0 0 0]  as
 [1 0 0 0 0 1 0 0 0 0 0 0 0 0]  discovered
 [0 0 1 0 0 0 0 0 0 0 0 0 1 0]  disguised
 [0 1 0 1 0 0 0 0 0 0 0 0 0 0]  explores
 [0 0 0 0 0 0 0 1 1 0 0 0 0 0]  landscape
 [0 0 0 0 0 0 1 0 0 0 1 0 0 0]  made
 [1 0 0 0 0 0 1 0 0 0 0 0 0 0]  multidimensional
 [0 0 1 0 0 0 0 0 0 0 0 0 0 1]  numbers
 [0 0 0 0 0 0 0 1 0 0 0 0 1 0]  of
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0]  the
 [0 0 0 0 1 0 0 0 0 0 1 0 0 0]  words
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0]] .


Algolit one-hot-vector scripts

Two one-hot-vector scripts were created during one of the Algolit sessions, both creating the same matrix but in a different way. To download and run them, use the following links: one-hot-vector_gijs.py & one-hot-vector_hans.py

Note that

"Words are represented once in a vector. So words with multiple meanings, like "bank", are more difficult to represent. There is research to multivectors for one word, so that it does not end up in the middle." (Richard Socher, idem.)]

For more notes on this lecture visit http://pad.constantvzw.org/public_pad/neural_networks_3