Actions

Contextual stories for Cleaners

From Algolit

Revision as of 17:05, 27 February 2019 by An (talk | contribs) (Created page with "== Project Gutenberg and Distributed Proofreaders == [http://www.gutenberg.org/ Project Gutenberg] is our cave of Ali Baba. It offers over 58,000 free eBooks to be downloaded...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Project Gutenberg and Distributed Proofreaders

Project Gutenberg is our cave of Ali Baba. It offers over 58,000 free eBooks to be downloaded or read online. Works are accepted on Gutenberg when their U.S. copyright has expired. Thousands of volunteers digitize and proofread books to help the project. An essential part of the work is done through the Distributed Proofreaders project. This is a web-based interface to help convert Public Domain books into e-books. Think of textfiles, epubs, kindle formats. By dividing the workload into individual pages, many volunteers can work on a book at the same time, which speeds up the cleaning process.

During proofreading, volunteers are presented with a scanned image of the page and a version of the text, as it is read by an OCR algorithm trained to recognize letters in images. This allows the text to be easily compared to the image, proofread, and sent back to the site. A second volunteer is then presented with the first volunteer's work. She verifies and corrects the work as necessary, and submits it back to the site. The book then similarly goes through a third proofreading round, plus two more formatting rounds using the same web interface. Once all the pages have completed these steps, a post-processor carefully assembles them into an e-book and submits it to the Project Gutenberg archive.

We collaborated with the Distributed Proofreaders Project to clean up the digitized files we received from the Mundaneum collection. From November 2018 till the first upload of the cleaned up book 'L'Afrique aux Noirs' in February 2019, An Mertens exchanged about 50 emails with Linda Hamilton, Sharon Joiner and Susan Hanlon, all volunteers from the Distributed Proofreaders Project. The conversation might inspire you to share unavailable books online.

Full email conversation

2. An algoliterary version of the Maintenance Manifesto https://www.arnolfini.org.uk/blog/manifesto-for-maintenance-art-1969 In 1969, one year after the birth of her first child, the NY artist Mierle Laderman Ukeles wrote a Manifesto for Maintenance. Ukeles' Manifesto calls for a readdressing of the status of maintenance work both in the private, domestic space, and in public. What follows is an altered version of her text inspired by the work of the Cleaners.

IDEAS A. The Death Instinct and the Life Instinct: The Death Instinct: separation; categorisation; Avant-Garde par excellence; to follow the predicted path to death—run your own code; dynamic change. The Life Instinct: unification; the eternal return; the perpetuation and MAINTENANCE of the material; survival systems and operations; equilibrium.

B. Two basic systems: Development and Maintenance. The sourball of every revolution: after the revolution, who’s going to try to spot the bias in the output? Development: pure individual creation; the new; change; progress; advance; excitement; flight or fleeing. Maintenance: keep the dust off the pure individual creation; preserve the new; sustain the change; protect progress; defend and prolong the advance; renew the excitement; repeat the flight; show your work—show it again, keep the git repository groovy, keep the data analysis revealing Development systems are partial feedback systems with major room for change. Maintenance systems are direct feedback systems with little room for alteration.

C. Maintenance is a drag; it takes all the fucking time (lit.) The mind boggles and chafes at the boredom. The culture assigns lousy status on maintenance jobs = minimum wages, Amazon mechanical turks = virtually no pay.

clean the set, tag the training data, correct the typos, modify the parameters, finish the report, keep the requester happy, upload the new version, attach words that were wrongly separated by OCR back together, complete those Human Intelligence Tasks, try to guess the meaning of the requester's formatting, you must accept the HIT before you can submit the results, summarize the image, add the bounding box, what's the semantic similarity of this text, check the translation quality, collect your micro-payments, become a hit Mechanical Turk.

https://requester.mturk.com/create/projects/new

3. A bot panic at Amazon Mechanical Turk https://www.wired.com/story/amazon-mechanical-turk-bot-panic/ https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gps-are-random http://timryan.web.unc.edu/2018/08/12/data-contamination-on-mturk/

Amazon's Mechanical Turk takes the name of an 18th Century chess-playing automaton. In fact, the Turk wasn't a machine at all. It was a mechanical illusion that allowed a human chess master to hide inside the box and manually operate it. For nearly 84 years, the Turk won most of the games played during its demonstrations around Europe and the Americas. Napoleon Bonaparte is said to have been fooled by this trick too.

The Amazon Mechanical Turk is an online platform for humans to execute tasks that algorithms cannot do. Examples are, annotating sentences as being positive or negative, spotting number plates, discriminating between face and non-face. The jobs posted on this platform are often paid less than a cent per task. Tasks that are more complex or require more knowledge can be paid up to several cents. To earn a living, turkers need to finish as much tasks as fast as possible, leading to inevitable mistakes. The makers of datasets have to incorporate quality checks when they post a job on the platform. They need to test whether the turker actually has the ability to complete the task, and they also need to verify the results. Many academic researchers use Mechanical Turk for tasks that would have been executed by students before.

In August last year Max Hui Bai, a psychology student from the University of Minnesota, discovered that the surveys he conducted with Mechanical Turk were full of nonsense answers to open-ended questions. He traced back the wrong answers and found out that they had been submitted by respondents with duplicate GPS locations. This raised suspicions. Though Amazon explicitly prohibits robots to complete jobs on Mechanical Turk, the company is not dealing with the problems they cause on their platform. Forums for Turkers are full of conversations about the automation of the work, sharing practises of how to create robots that would violate Amazon’s terms. You can also find YouTube videos showing Turkers how to write a bot to fill in answers for you.

Kristy Milland, an MTurk activist says: “Mechanical Turk workers have been treated really, really badly for 12 years, and so in some ways I see this as a point of resistance. If we were paid fairly on the platform, nobody would be risking their account this way.”

Bai created a questionnaire for researchers—outside of Mechanical Turk. He is now leading a research among social scientists to figure out how much bad data is in use, how large the problem is, and how to stop it. But it is impossible at the moment to estimate how many datasets have become unreliable in this way.