Alexa to Siri, all modern AI is based on an old idea that all humans feel the same six emotions

The world is being flooded with technology designed to monitor our emotions. Amazon’s Alexa is one of many virtual assistants that detect tone and timbre of voice in order to better understand commands. CCTV cameras can track faces through public space, and supposedly detect criminals before they commit crimes. Autonomous cars will one day be able to spot when drivers get road rage, and take control of the wheel.

But there’s a problem. While the technology is cutting-edge, it’s using an outdated scientific concept stating that all humans, everywhere, experience six basic emotions, and that we each express those emotions in the same way. By building a world filled with gadgets and surveillance systems that take this model as gospel, this obsolete view of emotion could end up becoming a self-fulfilling prophecy, as a vast range of human expressions around the world is forced into a narrow set of definable, machine-readable boxes.

Searching for emotions

The science used to underpin most contemporary emotion-detecting technologies began with a grieving teenage boy. Paul Ekman was born in 1934, the child of a pediatrician father and an attorney mother. He spent his youth dreaming of emulating his hero, Ferdinand Magellan, hoping to someday make discoveries that would change the world.

When Ekman was 14 years old, his mother’s depression resulted in her suicide. At a 2008 talk at San Francisco’s Exploratorium Museum, he discussed how, even at such a young age, he felt he “had to do something to make up for the fact that [he] wasn’t able to rescue her.” His dream of discovery shifted from geography to the uncharted regions of the mind.

Just one year later, in 1948, he dropped out of high school – he was highly intelligent, but frequently clashed with his teachers – to become an undergraduate at the University of Chicago. (At the time, students only needed two completed years of high school to apply to some colleges.) Heavily influenced by Freud, Ekman went on to complete a PhD in psychotherapy, studying the depressed. He was fascinated by nonverbal communication, studying patients’ body language and hand movements. Before long, he realised that his patients represented a biased sample: he was studying the survivors of depression, not those who had succumbed to the worst of their illness. He mused that “the road to understanding human behaviour and getting back to help people like [his mother] was not by looking at abnormal behavior but at normal behavior.”

Depression was an emotional disorder, so the man who had idolised Magellan finally found his own quest: to discover if all humans experienced a set of common emotions.

By this time, the 1960s, Ekman wasn’t the only person to have gone searching. The acclaimed anthropologist Margaret Mead had already spent years traveling the world, demonstrating that cultures express emotions differently. Most famously, Mead had lived during the 1920s on the small island of Ta’ū, in American Samoa, in an effort to discover if the emotional upheaval experienced by American and European adolescents was universal. She found that young Samoan women had none of the strong morality-linked feelings, like anxiety and disgust, that their contemporaries experienced in the United States. For example, it was quite normal for Samoan women in their late teens to engage in guilt-free casual sex before getting married and beginning a family. In 1928, when Mead’s Coming of Age in Samoa was published, her findings shocked American readers, and provided strong evidence that fundamental human experiences – including emotions – varied from culture to culture.

Mead’s work – finding evidence that emotions, and other social phenomena, were culturally constructed – had a huge influence on 20th-century feminist thought and activism. She promoted the idea that free love was a way to break free of male dominance, and that upbringing, not genetics, played a central role in the way people behaved. After Coming of Age in Samoa, Mead found more and more examples of how the Western way of thinking about emotion didn’t translate to the experience of non-Western, indigenous peoples. Her 1932 book The Changing Culture of an Indian Tribe, for example, documented cultural conflicts that had beset a Native American “Plains tribe” (she did not specify which one) as its members moved away, often with difficulty, from traditional practices and toward Western behaviors and emotions.

By the late 1960s, Mead’s views were all but scientific consensus in the West, and emotions were considered far from universal. Ekman had his doubts.

On the face of it

To understand why Ekman had problems with Mead’s research, we can look to Charles Darwin. In 1872, he wrote The Expression of the Emotions in Man and Animals, which points out that some instinctual actions – like raising an eyebrow in surprise – are shared by animals and humans. For Darwin, this was further evidence that humans and other animals had some kind of common evolutionary ancestor, as well as that emotions had some kind of biological, innate source. It was also a way to avoid outraging pious Victorians by claiming that they acted like animals; he thought that it would be much more persuasive to do the opposite, and point to cute, humanlike behavior in pets.

In 1955, Mead wrote the foreword to a reprint of Darwin’s essay, but she critiqued it as a historical curiosity. In her opinion, it wasn’t a work that held up in light of more modern research. Darwin’s Expression had a huge influence on Ekman, however – and by the time Expression was republished again in 1998, it was Ekman’s turn to write a foreword. He defended Darwin’s initial hunch, because the consensus had flipped. Innate emotions were in again, and it was Ekman’s research that was responsible.

It should be noted that Darwin was far from the first to suggest that emotions were innate. More than two millennia ago, Aristotle wrote about how “some men, who are in no sense alike, have the same facial expressions.” Nor was Aristotle the only ancient philosopher who thought this way. It was received wisdom throughout antiquity, persisting well into the late 17th century. Artist Charles Le Brun, influenced by Descartes’ The Passions of the Soul, wrote a treatise arguing that high art should make more use of exaggerated facial expressions – and he included sketches of what he considered to be some of the more fundamental expressions. He died before his Méthode Pour Apprendre à Dessiner les Passions was published in 1698, but his sketches had a huge influence on European art theory for centuries afterward.

Sketches of emotional facial expressions, Charles Le Brun, 1698. Photo credit: Pubblic Domain.

Le Brun was drawing on the practice of physiognomy, which stated that not only were faces the gateway to passions but they were also a way to access a person’s soul. Ugliness betrayed sin, and if a person looked like an animal then that individual would have similar attributes to the beast. Physiognomy and its offshoots, like phrenology, remained popular into the 20th century, and provided justification for many popular prejudices. For example, the contents page of U.S. physician James W. Redfield’s 1852 book Comparative Physiognomy gives a list of what would now be regarded as typically racist resemblances: “of Jews to goats,” “of Aztec children to mice,” and even “of Turks to turkeys.”

While the writings of Darwin and Ekman never go as far as justifying physiognomy, the idea that the face can give our hidden thoughts away is an old idea that has taken many forms through history. It continues to persist today.

By 1964, Ekman was struggling. He couldn’t study emotional behaviors without defining them precisely first, but nobody had yet been able to do that. This was when psychological theorist Silvan Tomkins, who would go on to become one of Ekman’s closest collaborators, introduced him to Darwin’s Expression. Ekman found Tomkins’ theoretical arguments in favor of innate human emotions to be more persuasive than Margaret Mead’s arguments in favor of culturally relative ones. He became convinced that, if he was going to test his hypothesis, he needed to first figure out a way to accurately quantify minute human facial expressions. Then he could see if there was a link between those facial expressions and inner, universal emotions.

Ekman spent the next eight years alongside Tomkins and another colleague, Wallace Friesen, developing their method. Ekman and Friesen tested their approach by asking students in the United States, Brazil, Chile, Argentina, and Japan to match photographs of facial expressions with words or stories related to emotions. It quickly became apparent that a basic set of six facial expressions was linked to the same six emotions in all of those places.

Those emotions were happiness, anger, sadness, disgust, surprise, and fear.

Examples of six of the basic emotional faces, as used in Ekman’s research. Clockwise, from top left: anger, fear, disgust, sadness, happiness, surprise. Photo credit: Paul Ekman

By chance, another researcher, Austrian ethnologist Irenäus Eibl-Eibesfeldt, independently published similar findings. The results seemed to vindicate Ekman’s belief that there was a set of basic, universal human emotions, and that facial expressions were how to identify them.

However, there was a loophole: all of the test subjects studied by both Ekman and Eibl-Eibesfeldt had seen Western media of some sort, whether photographs, films, or television programs. Ekman realized that, in order to truly test his hypothesis, he “needed to study people that had not seen the outside world.” Nearly 20 years after his mother’s death, Ekman could finally feel like Magellan, flying in an old Cessna to Papua New Guinea in search of an isolated tribe.

Ekman and Friesen entered a series of mountain ranges in the Southeast Highlands of Papua New Guinea, looking for a group of people as yet untouched by Western mass media. In the dense forest of the Okapa Valley, they were led to the Fore people, whom Western anthropologists had encountered for the first time only two decades earlier. The Fore lived on both the north and south sides of the Wanevinti Mountains, clustered in huts on the hillsides and almost cut off from the rest of the world. With a jeep and some patience, you could just about drive to the region via a rough track.

Once they reached the Fore, Ekman and Friesen screened their potential test participants. They needed people who had never seen movies or other Western media, and thus could not have been influenced by Western emotional responses; who did not speak any form of English; and who had never lived near or worked with an outsider. They found 189 adults and 130 children who fit the bill. The idea was to use the same photos and stories that the researchers had used everywhere else. Knowing that the Fore spoke three dialects, Ekman and Friesen put their translators through rigorous training in an attempt to make sure that varying translations of the stories would not influence the experiment.

Despite having never seen photos before, the Fore who took part in the experiment caught on quickly. The adults were shown three facial expressions, and the children two, along with a single-sentence story – for example, “This person is about to fight.” If the expressions were universal, the stories ought to be linked to just one of the pictures. They were. As often as 93% of the time, the Fore chose the same matching pairs of stories and expressions as the previous, less isolated research subjects had. Ekman and Friesen thought that they had nailed it, proving that all humans, everywhere, felt those six basic emotions: happiness, anger, sadness, disgust, surprise, and fear. The pair published their results in 1971. Margaret Mead was stunned.

Ekman could have left it there, but his curiosity wouldn’t let him. He wanted to know how Mead and the others could have been so wrong. He wondered if these expressions, while universal, were influenced by how each specific culture thought someone ought to behave. He ran another experiment. This time, he showed students in the U.S. and in Tokyo a series of American Navy medical training films of severe burns and limb amputations: “The worst stuff I could find,” said Ekman. One group was left to watch these films alone; the other was joined by an authority figure – a scientist in a white coat. Ekman secretly filmed the subjects’ facial expressions. The people who were joined by an authority figure reacted differently: the Japanese participants masked their reactions and remained stoic and stone-faced, while the Americans exaggerated their expressions. The people who were alone reacted in the same way, whether they were in Japan or the US. It was the presence of the authority figure in the room – the Margaret Mead in a white coat – that had triggered the differences between the groups. The anthropologists, Ekman suggested, were seeing what their subjects wanted them to see.

Curiously, regardless of who was in the room with the students, if the recordings were slowed down, you could still see slight traces of the six facial expressions. Ekman theorised that, despite cultural differences, these universal expressions couldn’t be completely suppressed. He gave them a name: “micro expressions.”

Ekman’s success has led to other discoveries. One of the more recent examples, in 2008, came when UCLA anthropologists Gregory Bryant and H. Clark Barrett ran a version of Ekman and Friesen’s experiment that used voices. Instead of the Fore, the Shuar people of Ecuador served as the group with which US subjects were compared. Both groups were asked to listen to sentences that translated easily between English and Shuar – plain phrases, like “The dog is in the house” and “She ate the fish,” that don’t reveal any emotional information on the part of the speaker. All that changed was the timbre of the voice. The participants were then asked to select one picture of a facial expression from five choices, to best represent the emotion being expressed by the voice they heard. Again, the results between the groups were similar, suggesting that, despite learned differences, universal basic emotions can also be spotted in speech.

Ekman’s studies, and to a lesser extent the work of Bryant and Barrett, are still considered definitive by many. Every week, dozens of peer-reviewed papers that build on the categories of basic emotions are published. Disney even made a movie using five of them as characters: Inside Out. Understandably, technology companies have put a similar amount of faith in the researchers’ work.

Modern echo

Where Ekman’s basic emotions and the digital age meet, emotion technology has risen. Without emotions, an artificial intelligence lacks a large part of what makes something sentient; and a machine that can’t understand emotions can’t react in a human way to commands. Examples of this science in action aren’t just confined to universities and Silicon Valley, either.

Nearly one-fifth of US adults have an Amazon Echo or equivalent smart speaker, like Google Home. Amazon wants people to trust their virtual assistant, Alexa, so they use whispers, shouts, and varied pitch and speed to indicate emotions, and to make her sound more human. She’s also been programmed with so-called “delighters”: randomised, humanizing responses, such as terrible jokes, beatboxing, and silly songs. Alexa also analyses our voices to work out our moods. When you get annoyed, Alexa will calm you down. When you are happy, she can join you in your joy. All of this works. It works so well, in fact, that many of you will not have found my description of a computer program as “she” or “her” unusual.

The teams behind Apple’s Siri, Microsoft’s Cortana, and Google’s Assistant are each developing emotion detection systems that use both voice and facial recognition – the same facial recognition technology that can already be used to access an iPhone X.

Emotion-detecting technology and artificial emotions are also being used for protection. Affectiva wants to monitor drivers, identifying the emotions in their voices, their body language, and their facial expressions. If you get a bad case of road rage, or slump at the wheel, their Automotive AI can take control of the car, drive you to the nearest safe place, and, if necessary, call first responders.

Artificial emotion technology is also being deployed as a crime-fighting tool. Since 1978, Ekman has been personally teaching people to detect micro expressions. He’s trained operatives and officers at the CIA, Scotland Yard, the Department of Homeland Security, and many others; he even taught teams at Pixar how to animate micro expressions into characters’ faces. His work also inspired a TV series, titled Lie to Me, on which he worked as an adviser. However, the show’s glossy production is misleading about how easy it is to “read” someone’s micro expressions. In 2007, the Transportation Security Administration launched a program called Screening Passengers by Observation Techniques, or SPOT – airport security officers were trained to read micro expressions in the faces of passengers waiting for planes, in an effort to identify terrorists. It was a complete failure. The stress of flying makes passengers look and act in atypical ways, and micro expressions, if they exist at all, are naked to the human eye. The Transportation Security Administration’s results were usually worse than guesswork.

Where humans fail, technology picks up the slack. The University of Rochester, in New York, has crowdsourced photos of more than a million faces, to build a database of micro expressions. It’s a way to train computers to assess whether someone waiting in line at the airport might be a terrorist. Gone are fallible human brains, replaced by emotion-detecting AI that watches humans in airports, through CCTV, and in police interview rooms. This isn’t science fiction – the sunglasses of some Chinese police officers already have face recognition technology built into them.

Developing emotion-detecting technologies would be much more difficult if it weren’t for Ekman’s discovery of basic emotions and micro expressions. Programming software is easier when emotions can be categorized and measured. But here’s the problem – in that every one of these systems seems to run into a problem of some kind when progressing beyond small-scale trials. Once you try to apply the basic model of emotions at scale, instead of in a lab, it starts to look less infallible.

That may be because emotions aren’t as simple as Ekman thinks they are.

What’s emotion anyway?

There are three problems with the idea that there are only six basic emotions.

The first is that, despite everything, there still isn’t a definition of “emotions” that everyone agrees on. Almost every paper for the last 50 years has included its own version. Psychologist Robert Sternberg calls an emotion “a feeling comprising physiological and behavioral (and possibly cognitive) reactions to internal and external events”; neuroscientist Jaak Panksepp defines it as “intense arousal of brain systems that strongly encourage the organism to act impulsively”; and social psychologist Phoebe Ellsworth says emotions are a process that is “initiated when one’s attention is captured by some discrepancy or change.” (Even this partial list of different definitions shows how much variation there is.)

This might be because the concept of “emotions” is a relatively recent one. Historian Thomas Dixon claims that the English word “emotions” has only been used as it is now since the early 19th century. Before that, our feelings were subject to a more subtle categorisation. There were “passions,” felt first in the body, then the mind; “affects” in the body, but beginning with thought; and “sentiments” to guide moral choices and judgments in artistic taste. We can’t definitively categorise something when its definition has been in flux for so long.

The second, and bigger, problem lies with Ekman and Friesen’s Papua New Guinea experiment itself. There were three main issues with this study. First, they weren’t the first people to meet and document Fore customs – anthropologists Ronald and Catherine Berndt had studied the North Fore in 1953, and missionaries and government patrols had already visited the South Fore before then. By the time Ekman visited them, the Fore, once known for aggression toward outsiders and for ritual cannibalism, were growing coffee and using money. (Ekman has talked about the request made by his funders for receipts, joking about how he had to keep ledger entries for expenses such as a “blessing from the local witch doctor.”) There is only a low likelihood that any members of the Fore remained entirely isolated from the rest of the world by the late 1960s.

The second issue with the study lies in the use of translation. What were the single-sentence “stories” translated to, exactly? Any translator will tell you that translation is not a case of swapping the words in one language for the words in another. If you do that, you get Google Translate-style results. Even words in related languages can be difficult to match, too. Translating to a language like Fore, which is extremely distant in relation to English, amplifies this problem, regardless of how well-drilled the translators are.

The third issue with the experiment is the faces in the photos. In real life, facial expressions are rarely as explicit or exaggerated as in Ekman’s photographs. Recent studies by psychologist James Russell and his team have shown that when more realistic faces are used, children may not recognise the emotions until they are as old as eight. Younger kids don’t know if a “disgust” face is supposed to be disgust or anger, for example. More recently, a group led by psychologist Lisa Feldman Barrett has found that if you provide a wide range of facial expressions in photos, and allow participants to group them into categories of their choosing, those categories don’t match from one culture to the next.

This leads to the third of our larger problems – that there is now more than just one list of basic emotions, depending on whom you ask. Furthermore, there are ways of understanding emotions that don’t require them to be either universal or simplistic. For example, the Psychological Construction of Emotion Theory is gaining significant support in the emotions research community. It suggests that, while we do all feel similar things – called “core affects” – numerous factors are part of the “construction” of an emotion. Such factors include the way we were raised to understand these feelings, the language we use to describe them, the situations we’re in when we feel them, our memories of other situations in which we experienced those feelings, and many other elements. Emotions cannot be reduced to just a feeling and a face.

It now looks as though emotions are not universal, after all. Even if there are experiences that all humans share, like a “yuck” feeling that keeps us away from moldy food, these are expressed by different cultures in different ways, and not always with the same facial expressions or vocal cues. Sadly, this nuance seems not to have filtered down to developers and programmers.

People are already being monitored in airports; many Americans already have Alexa or her friends in their homes; self-driving cars already exist (even if you can’t buy one just yet). All of these systems have gone wrong. Alexa has broadcast private conversations to unsuspecting recipients, self-driving cars have crashed into pedestrians, and anyone who has tried to use the supposedly state-of-the-art facial recognition systems at passport security knows what a frustration they can be.

Do we really want these devices and systems to keep us calm, judge our road rage, or spot our criminal tendencies? How long before someone is wrongly accused and convicted for something that a pair of sunglasses reports they were going to do?

A new pseudoscience of physiognomy is developing, in which the faces of criminals are detected using digital versions of Charles Le Brun’s work. Deviations from expected, universal expressions of emotion are not to be tolerated. As a result, the many and diverse ways of emotional expression, which vary from culture to culture, might have to merge into one, reducing the rich tapestry of expression found across the world – and causing what emotion researcher William Reddy calls an “emotional regime.” If this regime spreads across the entire world in the near future, as it may well do, Ekman’s basic emotions might become universal, after all.

This article first appeared on Medium’s How We Get To Next.

We welcome your comments at letters@scroll.in.