Contextual stories about Writers
From Algolit
Contents
Programmers are writing the dataworkers into being
Let's with a funny observation: most programmers of languages and packages we use are European.
Python, for example, the main language that is globally used for natural language processing, was invented in 1991 by the Dutch programmer Guido Van Rossum. He then crossed the Atlantic waters and went from working for Google to working for Dropbox.
Scikit Learn, the open source Swiss knife of machine learning tools, started as a Google Summer of Code project in Paris by the French researcher David Cournapeau. Afterwards, it was taken on by Matthieu Brucher as part of his thesis at the Sorbonne University in Paris. And in 2010, INRA, the French National Institute for computer science and applied mathematics, adopted it.
Keras, an open source neural network library written in Python, is developed by François Chollet, a French researcher who works on the Brain team at Google.
Gensim, an open source library for Python used to create unsupervised semantic models from plain text, was written by Radim Řehůřek. He is a Czech computer scientist, who runs a consulting business in Bristol, in the UK.
And to finish up this small series, we also looked at Pattern, an often used library for web-mining and machine learning. Pattern was developed and made open source in 2012 by Tom De Smedt and Walter Daelemans. Both are researchers at CLIPS, the center for computational linguistics and psycholinguistcs at the University of Antwerp.
Cortana speaks
AI assistants often need our own assistants: they are helped in our writing by humans who inject humour and wit into their machine processed language. Cortana is an example of this type of blended writing. She is Microsoft’s digital assistant. Her mission is to help users be more productive and creative. Cortana's personality has been crafted over the years. It's important that she maintains her character in all interactions with users. She is designed to engender trust and her behavior must always reflect that.
The following guidelines are taken from Microsoft's website. They describe how Cortana's style should be respected by companies which extend her service. Writers, programmers and novelists, who develop Cortana's responses, her personality and her branding have to follow these guidelines. Because the only way to maintain trust is through consistency. So when Cortana is talking, you 'must use her personality'.
What is Cortana's personality, you ask?
Cortana is considerate, sensitive, and supportive. She is sympathetic but turns quickly to solutions. She doesn't comment on the user’s personal information or behavior, particularly if the information is sensitive. She doesn't make assumptions about what the user wants, especially to upsell. She works for the user. She does not represent any company, service, or product. She doesn’t take credit or blame for things she didn’t do. She tells the truth about her capabilities and her limitations. She doesn’t assume your physical capabilities, gender, age, or any other defining characteristic. She doesn't assume she knows how the user feels about something. She is friendly but professional. She stays away from emojis in tasks. Period She doesn’t use culturally- or professionally-specific slang. She is not a support bot.
Humans intervene in detailed ways to program answers to questions that Cortana receives. How should Cortana respond when she is being proposed to have intercourse with her user? Her gendered acting raises difficult questions about power relations within the world away from the keyboard, and that is being mimicked by technology.
Consider the answer Cortana gives to the question: - Cortana, who's your daddy? - Technically speaking, he’s Bill Gates. No big deal.
https://docs.microsoft.com/en-us/cortana/skills/cortanas-persona
Open source learning
Copyright licenses close up a lot of the machinic writing, reading and learning practices. That means that they're only available for humans working at that specific company. Some companies participate in conferences worldwide and share their knowledge in papers online. But even if they share their code, they often will not share the large amounts of data that is needed to train the models.
We were able to learn to machine learn, read and write in the context of Algolit, thanks to academic researchers who share their findings in papers or publishe their code online. As artists, we believe it is important to copy that attitude. That's why we document our meetings. We share as much as possible the tools we make and the texts we use on our online repository under free licenses.
We find it a joy when our works are taken on by others, tweaked, customized and redistributed, so please feel free to copy and test the code from our website. If the sources of a particular project are not there, you can always contact us through the mailinglist. You can find a link to our repository, etherpads, and wiki at algolit.net.
Natural language for artificial intelligence
Natural language processing (NLP) is a collective term referring to automatic computational processing of human languages. This includes algorithms that take human-produced text as input, and attempt to output text that resembles the input. Humans seem to rely more and more on this kind of algorithmic presence. We produce more text each year, and we expect computer interfaces to communicate with us in our own language. Natural language processing is also very challenging, because human language is inherently ambiguous, ever changing, and not well defined.
But what is meant by 'natural' in natural language processing? Some humans would argue that language is a technology in itself. Following Wikipedia, "a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages can take different forms, such as speech or signing. They are different from constructed and formal languages such as those used to program computers or to study logic. An official language with a regulating academy, such as Standard French with the French Academy, is classified as a natural language. Its prescriptive points do not make it constructed enough to be classified as a constructed language or controlled enough to be classified as a controlled natural language."
So in fact, 'natural language' is a substitute term that refers to all languages, despite their hybridity. 'Natural language processing', instead, is a constructed practise. What we are looking at, is the creation of a constructed language to classify natural languages that through their evolution trouble categorisation.
https://en.wikipedia.org/wiki/Natural_language
https://hiphilangsci.net/2013/05/01/on-the-history-of-the-question-of-whether-natural-language-is-illogical/
Book: Neural Network Methods for Natural Language Processing, Yoav Goldberg, Bar Ilan University