A One Hot Vector: Difference between revisions
From Algolit
Line 12: | Line 12: | ||
"''Meaning is this illusive thing that were trying to capture''" (Richard Socher in [https://www.youtube.com/watch?v=xhHOL3TNyJs&index=2&list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi CS224D Lecture 2 - 31st Mar 2016 (Youtube)]) | "''Meaning is this illusive thing that were trying to capture''" (Richard Socher in [https://www.youtube.com/watch?v=xhHOL3TNyJs&index=2&list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi CS224D Lecture 2 - 31st Mar 2016 (Youtube)]) | ||
− | + | If we work with the example sentence ... | |
− | with one 0 for each word in a vocabulary | + | "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers." |
− | + | ||
− | + | ... these are the 14 words we work with ... | |
+ | |||
+ | a | ||
+ | algoliterary | ||
+ | as | ||
+ | discovered | ||
+ | disguised | ||
+ | explores | ||
+ | landscape | ||
+ | made | ||
+ | multidimensional | ||
+ | numbers | ||
+ | of | ||
+ | the | ||
+ | words | ||
+ | . | ||
+ | |||
+ | |||
+ | ... a single vector in a one-hot-vector looks like this ... | ||
+ | |||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] | ||
+ | |||
+ | ... and a full fourteen-dimensional matrix like this ... | ||
+ | |||
+ | [[0 0 0 0 0 0 0 0 0 0 0 0 0 0] a | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] algoliterary | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] as | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] discovered | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] disguised | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] explores | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] landscape | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] made | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] multidimensional | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] numbers | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] of | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] the | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0] words | ||
+ | [0 0 0 0 0 0 0 0 0 0 0 0 0 0]] . | ||
+ | |||
+ | ... with one 0 for each unique word in a vocabulary, and a row for each unique word. | ||
+ | |||
+ | The following step is to count how often a word appears next to another ... | ||
+ | |||
+ | "The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers." | ||
+ | |||
+ | [[0 0 0 1 0 0 0 0 1 0 0 0 0 0] a | ||
+ | [0 0 0 0 0 1 0 0 0 0 0 1 0 0] algoliterary | ||
+ | [0 0 0 0 1 0 0 0 0 1 0 0 0 0] as | ||
+ | [1 0 0 0 0 1 0 0 0 0 0 0 0 0] discovered | ||
+ | [0 0 1 0 0 0 0 0 0 0 0 0 1 0] disguised | ||
+ | [0 1 0 1 0 0 0 0 0 0 0 0 0 0] explores | ||
+ | [0 0 0 0 0 0 0 1 1 0 0 0 0 0] landscape | ||
+ | [0 0 0 0 0 0 1 0 0 0 1 0 0 0] made | ||
+ | [1 0 0 0 0 0 1 0 0 0 0 0 0 0] multidimensional | ||
+ | [0 0 1 0 0 0 0 0 0 0 0 0 0 1] numbers | ||
+ | [0 0 0 0 0 0 0 1 0 0 0 0 1 0] of | ||
+ | [0 1 0 0 0 0 0 0 0 0 0 0 0 0] the | ||
+ | [0 0 0 0 1 0 0 0 0 0 1 0 0 0] words | ||
+ | [0 0 0 0 0 0 0 0 0 1 0 0 0 0]] . | ||
=Note that= | =Note that= |
Revision as of 15:05, 25 October 2017
Type: | Algoliterary exploration |
Technique: | word-embeddings |
Developed by: | Algolit |
one-hot-vectors
"Meaning is this illusive thing that were trying to capture" (Richard Socher in CS224D Lecture 2 - 31st Mar 2016 (Youtube))
If we work with the example sentence ...
"The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
... these are the 14 words we work with ...
a algoliterary as discovered disguised explores landscape made multidimensional numbers of the words .
... a single vector in a one-hot-vector looks like this ...
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]
... and a full fourteen-dimensional matrix like this ...
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0] a [0 0 0 0 0 0 0 0 0 0 0 0 0 0] algoliterary [0 0 0 0 0 0 0 0 0 0 0 0 0 0] as [0 0 0 0 0 0 0 0 0 0 0 0 0 0] discovered [0 0 0 0 0 0 0 0 0 0 0 0 0 0] disguised [0 0 0 0 0 0 0 0 0 0 0 0 0 0] explores [0 0 0 0 0 0 0 0 0 0 0 0 0 0] landscape [0 0 0 0 0 0 0 0 0 0 0 0 0 0] made [0 0 0 0 0 0 0 0 0 0 0 0 0 0] multidimensional [0 0 0 0 0 0 0 0 0 0 0 0 0 0] numbers [0 0 0 0 0 0 0 0 0 0 0 0 0 0] of [0 0 0 0 0 0 0 0 0 0 0 0 0 0] the [0 0 0 0 0 0 0 0 0 0 0 0 0 0] words [0 0 0 0 0 0 0 0 0 0 0 0 0 0]] .
... with one 0 for each unique word in a vocabulary, and a row for each unique word.
The following step is to count how often a word appears next to another ...
"The algoliterary explorers discovered a multidimensional landscape made of words disguised as numbers."
[[0 0 0 1 0 0 0 0 1 0 0 0 0 0] a [0 0 0 0 0 1 0 0 0 0 0 1 0 0] algoliterary [0 0 0 0 1 0 0 0 0 1 0 0 0 0] as [1 0 0 0 0 1 0 0 0 0 0 0 0 0] discovered [0 0 1 0 0 0 0 0 0 0 0 0 1 0] disguised [0 1 0 1 0 0 0 0 0 0 0 0 0 0] explores [0 0 0 0 0 0 0 1 1 0 0 0 0 0] landscape [0 0 0 0 0 0 1 0 0 0 1 0 0 0] made [1 0 0 0 0 0 1 0 0 0 0 0 0 0] multidimensional [0 0 1 0 0 0 0 0 0 0 0 0 0 1] numbers [0 0 0 0 0 0 0 1 0 0 0 0 1 0] of [0 1 0 0 0 0 0 0 0 0 0 0 0 0] the [0 0 0 0 1 0 0 0 0 0 1 0 0 0] words [0 0 0 0 0 0 0 0 0 1 0 0 0 0]] .
Note that
"Words are represented once in a vector. So words with multiple meanings, like "bank", are more difficult to represent. There is research to multivectors for one word, so that it does not end up in the middle." (Richard Socher, idem.)]
For more notes on this lecture visit http://pad.constantvzw.org/public_pad/neural_networks_3