Actions

CHARNN text generator: Difference between revisions

From Algolit

 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Algoliterary-Encounters]]
 
 
 
{|
 
{|
 
|-
 
|-
 
| Type: || Algoliterary exploration
 
| Type: || Algoliterary exploration
 
|-
 
|-
| Dataset(s): || Complete Works by Shakespeare, Complete Works by Jules Verne, Enron Email Archive
+
| Dataset(s): || Complete Works by Shakespeare & Jules Verne, Enron Email Archive
 
|-
 
|-
 
| Technique: || Torch, Cuda, Recurrent Neural Network, LSTM
 
| Technique: || Torch, Cuda, Recurrent Neural Network, LSTM
Line 12: Line 10:
 
|}
 
|}
  
The CharRNN text generator produces text using the CharRNN model. This is a recurrent neural network that reads a text character per character. In the training phase the model analyzes which characters occur after each other and learns the chances of the next character based on the previous character it has seen. The model has a memory that varies in size. In the learning process it can forget certain information it has seen as the network is constructed using Long Short Term Memory modules.
+
''The CharRNN text generator'' produces text using the CharRNN model. This is a recurrent neural network that reads a text character per character. In the training phase the model analyzes which characters occur after each other and learns the chances of the next character based on the previous character it has seen. The model has a memory that varies in size. In the learning process it can forget certain information it has seen as the network is constructed using Long Short Term Memory modules.
 +
 
 +
One of the first things the model learns is that words are separated by spaces and sentences are separated by a period, a space, followed by an uppercase. Although it might seem the model has learned that a text is constructed using multiple words and sentences, it has actually learned that after a small amount of characters the chances are high a space will occur, and after more series of characters and spaces there will be a period, a space and an uppercase character.
  
One of the first things the model learns is that words are separated by spaces and sentences are separated by a period, a space, followed by an uppercase. Although it might seem the model has learned that a text is constructed out of multiple words and sentences, it has actually learned that after a certain amount of characters the chances are high a space will occur, and that after a series of character sand spaces the chances grow there will be a period, a space and an uppercase character.
+
The generator interface is trained on various data-sets and can be explored.
 +
The model is based on a [https://github.com/jcjohnson/torch-rnn script by Justin Johnson]
 +
This script is an improved version of the original script by [https://github.com/karpathy/char-rnn Andrej Karpathy]
  
The generator interface is trained on various data-sets and can be tried out.
+
[[Category:Algoliterary-Encounters]]
The model is based on a script by Justin Johnson: https://github.com/jcjohnson/torch-rnn
 
This script is an improved version of the original script by Andrej Karpathy: https://github.com/karpathy/char-rnn
 

Latest revision as of 13:42, 2 November 2017

Type: Algoliterary exploration
Dataset(s): Complete Works by Shakespeare & Jules Verne, Enron Email Archive
Technique: Torch, Cuda, Recurrent Neural Network, LSTM
Developed by: Justin Johnson (original version: Andrej Karpathy)

The CharRNN text generator produces text using the CharRNN model. This is a recurrent neural network that reads a text character per character. In the training phase the model analyzes which characters occur after each other and learns the chances of the next character based on the previous character it has seen. The model has a memory that varies in size. In the learning process it can forget certain information it has seen as the network is constructed using Long Short Term Memory modules.

One of the first things the model learns is that words are separated by spaces and sentences are separated by a period, a space, followed by an uppercase. Although it might seem the model has learned that a text is constructed using multiple words and sentences, it has actually learned that after a small amount of characters the chances are high a space will occur, and after more series of characters and spaces there will be a period, a space and an uppercase character.

The generator interface is trained on various data-sets and can be explored. The model is based on a script by Justin Johnson This script is an improved version of the original script by Andrej Karpathy