The data (e)speaks: Difference between revisions

Revision as of 17:17, 25 October 2017

Type:	Algoliterary exploration
Datasets:
Technique:	espeak
Developed by:	& Algolit

In the process of making the Algolit datasets, careful consideration was given to the selection of the source texts. Our attempt was to have a variety of tone of voices that highlights the heterogeneity of all of them combined.

The texts were gathered from aaaaarg.fail, gen.lib.rus.ec, archive.org and gutenberg.org, run through terminal commands such as pdftotext in order to generate .txt files and stripped of punctuation marks with the help of a Python code snippet.

The ensuing datasets were:

Revision as of 17:16, 25 October 2017 (view source) Cristina (talk \| contribs) ← Older edit		Revision as of 17:17, 25 October 2017 (view source) Cristina (talk \| contribs) Newer edit →
Line 14:		Line 14:


−	The texts were gathered from aaaaarg.fail, gen.lib.rus.ec, archive.org and gutenberg.org, run through terminal commands such as ~~'''~~pdftotext~~'''~~ in order to generate .txt files and stripped of punctuation marks with the help of [https://gitlab.constantvzw.org/algolit/algolit/blob/master/algoliterary_encounter/algoliterary-toolkit/text-punctuation-clean-up.py a Python code snippet].	+	The texts were gathered from aaaaarg.fail, gen.lib.rus.ec, archive.org and gutenberg.org, run through terminal commands such as [https://en.wikipedia.org/wiki/Pdftotext pdftotext] in order to generate .txt files and stripped of punctuation marks with the help of [https://gitlab.constantvzw.org/algolit/algolit/blob/master/algoliterary_encounter/algoliterary-toolkit/text-punctuation-clean-up.py a Python code snippet].
		+

	The ensuing datasets were:		The ensuing datasets were:

The data (e)speaks: Difference between revisions

From Algolit

Revision as of 17:17, 25 October 2017