Contextual stories about Writers
Revision as of 15:32, 1 March 2019 by Cristina
Programmers are writing the dataworkers into being
We recently made a funny realization: most programmers of languages and packages Algolit uses are European.
Python, for example, the main language that is globally used for natural language processing, was invented in 1991 by the Dutch programmer Guido Van Rossum. He then crossed the Atlantic waters and went from working for Google to working for Dropbox.
Scikit Learn, the open source Swiss knife of machine learning tools, started as a Google Summer of Code project in Paris by the French researcher David Cournapeau. Afterwards, it was taken on by Matthieu Brucher as part of his thesis at the Sorbonne University in Paris. And in 2010, INRA, the French National Institute for computer science and applied mathematics, adopted it.
Keras, an open source neural network library written in Python, is developed by François Chollet, a French researcher who works on the Brain team at Google.
Gensim, an open source library for Python used to create unsupervised semantic models from plain text, was written by Radim Řehůřek. He is a Czech computer scientist, who runs a consulting business in Bristol, in the UK.
And to finish up this small series, we also looked at Pattern, an often used library for web-mining and machine learning. Pattern was developed and made open source in 2012 by Tom De Smedt and Walter Daelemans. Both are researchers at CLIPS, the center for computational linguistics and psycholinguistcs at the University of Antwerp.
AI assistants often need their own assistants: they are helped in their writing by humans who inject humour and wit into their machine processed language. Cortana is an example of this type of blended writing. She is Microsoft’s digital assistant. Her mission is to help users be more productive and creative. Cortana's personality has been crafted over the years. It's important that she maintains her character in all interactions with users. She is designed to engender trust and her behavior must always reflect that.
The following guidelines are taken from Microsoft's website. They describe how Cortana's style should be respected by companies which extend her service. Writers, programmers and novelists, who develop Cortana's responses, her personality and her branding have to follow these guidelines. Because the only way to maintain trust is through consistency. So when Cortana is talking, you 'must use her personality'.
What is Cortana's personality, you ask?
Cortana is considerate, sensitive, and supportive. She is sympathetic but turns quickly to solutions. She doesn't comment on the user’s personal information or behavior, particularly if the information is sensitive. She doesn't make assumptions about what the user wants, especially to upsell. She works for the user. She does not represent any company, service, or product. She doesn’t take credit or blame for things she didn’t do. She tells the truth about her capabilities and her limitations. She doesn’t assume your physical capabilities, gender, age, or any other defining characteristic. She doesn't assume she knows how the user feels about something. She is friendly but professional. She stays away from emojis in tasks. Period She doesn’t use culturally- or professionally-specific slang. She is not a support bot.
Humans intervene in detailed ways to program answers to questions that Cortana receives. How should Cortana respond when she is being proposed inappropriate actions? Her gendered acting raises difficult questions about power relations within the world away from the keyboard, which is being mimicked by technology.
Consider the answer Cortana gives to the question:
- Cortana, who's your daddy?
- Technically speaking, he’s Bill Gates. No big deal.
Open source learning
Copyright licenses close up a lot of the machinic writing, reading and learning practices. That means that they're only available for the employees of a specific company. Some companies participate in conferences worldwide and share their knowledge in papers online. But even if they share their code, they often will not share the large amounts of data that is needed to train the models.
We were able to learn to machine learn, read and write in the context of Algolit, thanks to academic researchers who share their findings in papers or publish their code online. As artists, we believe it is important to copy that attitude. That's why we document our meetings. We share the tools we make as much as possible and the texts we use are on our online repository under free licenses.
We find it a joy when our works are taken on by others, tweaked, customized and redistributed, so please feel free to copy and test the code from our website. If the sources of a particular project are not there, you can always contact us through the mailinglist. You can find a link to our repository, etherpads, and wiki at http://www.algolit.net.
Natural language for artificial intelligence
Natural language processing (NLP) is a collective term referring to automatic computational processing of human languages. This includes algorithms that take human-produced text as input, and attempt to generate text that resembles it. We produce more and more written work each year, and there is a growing trend in making computer interfaces to communicate with us in our own language. Natural language processing is also very challenging, because human language is inherently ambiguous and ever changing.
But what is meant by 'natural' in natural language processing? Some would argue that language is a technology in itself. Following Wikipedia, "a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages can take different forms, such as speech or signing. They are different from constructed and formal languages such as those used to program computers or to study logic. An official language with a regulating academy, such as Standard French with the French Academy, is classified as a natural language. Its prescriptive points do not make it constructed enough to be classified as a constructed language or controlled enough to be classified as a controlled natural language."
So in fact, 'natural languages' also includes languages which do not fit in any other group. 'Natural language processing', instead, is a constructed practice. What we are looking at, is the creation of a constructed language to classify natural languages that through their very definition trouble categorisation.
Book: Neural Network Methods for Natural Language Processing, Yoav Goldberg, Bar Ilan University, April 2017.