Actions

Cleaning for Poems: Difference between revisions

From Algolit

 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
by Algolit
 
by Algolit
 +
 +
[https://gitlab.constantvzw.org/algolit/mundaneum/tree/master/exhibition/3-Cleaners/the-cleaner Sources on Gitlab]
 
    
 
    
For this exhibition we're working with 3% of the Mundaneum's archive. These documents have first been scanned or photographed. To make the documents searchable they are transformed into text using Optical Character Recognition software (OCR). OCR are algorithmic models that are trained on other texts. They learned to identify characters, words, sentences and paragraphs.  
+
For this exhibition we worked with 3 per cent of the Mundaneum's archive. These documents were first scanned or photographed. To make the documents searchable they were transformed into text using Optical Character Recognition software (OCR). OCR are algorithmic models that are trained on other texts. They have learned to identify characters, words, sentences and paragraphs. The software often makes 'mistakes'. It might recognize a wrong character, it might get confused by a stain an unusual font or the reverse side of the page being visible.  
The software most often makes 'mistakes'. It might recognize a wrong character, it might get confused by a stain an unusual font or the other side of the page shining through.  
+
 
These mistakes can also seen as poetic interpretations by the algorithm. They tell us something of how it has been constructed, what it has been learning from, what standards are and how you can explore the limits of a machine. In this installation you can choose how you treat the algorithm's misreadings, pick your degree of poetic cleanness, print your poem and take it home.
+
While these mistakes are often considered noise, confusing the training, they can also be seen as poetic interpretations of the algorithm. They show us the limits of the machine. And they also reveal how the algorithm might work, what material it has seen in training and what is new. They say something about the standards of its makers. In this installation we ask your help in verifying our dataset. As a reward we'll present you with a personal algorithmic improvisation.
  
 
------------------------------------------
 
------------------------------------------

Latest revision as of 17:52, 4 June 2019

by Algolit

Sources on Gitlab

For this exhibition we worked with 3 per cent of the Mundaneum's archive. These documents were first scanned or photographed. To make the documents searchable they were transformed into text using Optical Character Recognition software (OCR). OCR are algorithmic models that are trained on other texts. They have learned to identify characters, words, sentences and paragraphs. The software often makes 'mistakes'. It might recognize a wrong character, it might get confused by a stain an unusual font or the reverse side of the page being visible.

While these mistakes are often considered noise, confusing the training, they can also be seen as poetic interpretations of the algorithm. They show us the limits of the machine. And they also reveal how the algorithm might work, what material it has seen in training and what is new. They say something about the standards of its makers. In this installation we ask your help in verifying our dataset. As a reward we'll present you with a personal algorithmic improvisation.


Concept, code, interface: Gijs de Heij