Wearable AI system can detect a conversation&#039;s tone

Coupled with audio and vital-sign data, deep-learning system could someday serve as a “social coach” for people with anxiety or Asperger’s.

Watch Video

Adam Conner-Simons | Rachel Gordon | CSAIL

February 1, 2017

Press Inquiries

Press Contact:

Adam Conner-Simons

Email: aconner@csail.mit.edu

Phone: 617-324-9135

MIT Computer Science & Artificial Intelligence Lab

PhD candidate Mohammad Ghassemi (left) and graduate student Tuka Alhanai's system can detect the tone of a conversation using a wearable device.

Photo: Jason Dorfman/MIT CSAIL

The team's system was implemented on a Samsung Simband, a research device that can measure metrics such as movement, heart rate, blood pressure and skin temperature.

Photo: Jason Dorfman/MIT CSAIL

PhD candidate Mohammad Ghassemi (left) and graduate student Tuka Alhanai

Photo: Jason Dorfman/MIT CSAIL

It’s a fact of nature that a single conversation can be interpreted in very different ways. For people with anxiety or conditions such as Asperger’s, this can make social situations extremely stressful. But what if there was a more objective way to measure and understand our interactions?

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) say that they’ve gotten closer to a potential solution: an artificially intelligent, wearable system that can predict if a conversation is happy, sad, or neutral based on a person’s speech patterns and vitals.

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” says graduate student Tuka Alhanai, who co-authored a related paper with PhD candidate Mohammad Ghassemi that they will present at next week’s Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

As a participant tells a story, the system can analyze audio, text transcriptions, and physiological signals to determine the overall tone of the story with 83 percent accuracy. Using deep-learning techniques, the system can also provide a “sentiment score” for specific five-second intervals within a conversation.

“As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions,” says Ghassemi. “Our results show that it’s possible to classify the emotional tone of conversations in real-time.”

The researchers say that the system's performance would be further improved by having multiple people in a conversation use it on their smartwatches, creating more data to be analyzed by their algorithms. The team is keen to point out that they developed the system with privacy strongly in mind: The algorithm runs locally on a user’s device as a way of protecting personal information. (Alhanai says that a consumer version would obviously need clear protocols for getting consent from the people involved in the conversations.)

How it works

Many emotion-detection studies show participants “happy” and “sad” videos, or ask them to artificially act out specific emotive states. But in an effort to elicit more organic emotions, the team instead asked subjects to tell a happy or sad story of their own choosing.

Subjects wore a Samsung Simband, a research device that captures high-resolution physiological waveforms to measure features such as movement, heart rate, blood pressure, blood flow, and skin temperature. The system also captured audio data and text transcripts to analyze the speaker’s tone, pitch, energy, and vocabulary.

“The team’s usage of consumer market devices for collecting physiological data and speech data shows how close we are to having such tools in everyday devices,” says Björn Schuller, professor and chair of Complex and Intelligent Systems at the University of Passau in Germany, who was not involved in the research. “Technology could soon feel much more emotionally intelligent, or even ‘emotional’ itself.”

After capturing 31 different conversations of several minutes each, the team trained two algorithms on the data: One classified the overall nature of a conversation as either happy or sad, while the second classified each five-second block of every conversation as positive, negative, or neutral.

Alhanai notes that, in traditional neural networks, all features about the data are provided to the algorithm at the base of the network. In contrast, her team found that they could improve performance by organizing different features at the various layers of the network.

“The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data," says Alhanai. “It’s quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers.”

Results

Indeed, the algorithm’s findings align well with what we humans might expect to observe. For instance, long pauses and monotonous vocal tones were associated with sadder stories, while more energetic, varied speech patterns were associated with happier ones. In terms of body language, sadder stories were also strongly associated with increased fidgeting and cardiovascular activity, as well as certain postures like putting one’s hands on one’s face.

On average, the model could classify the mood of each five-second interval with an accuracy that was approximately 18 percent above chance, and a full 7.5 percent better than existing approaches.

The algorithm is not yet reliable enough to be deployed for social coaching, but Alhanai says that they are actively working toward that goal. For future work the team plans to collect data on a much larger scale, potentially using commercial devices such as the Apple Watch that would allow them to more easily implement the system out in the world.

“Our next step is to improve the algorithm’s emotional granularity so that it is more accurate at calling out boring, tense, and excited moments, rather than just labeling interactions as ‘positive’ or ‘negative,'” says Alhanai. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”

This research was made possible in part by the Samsung Strategy and Innovation Center.

Press Mentions

The Wall Street Journal

Daniel Akst of The Wall Street Journal writes about the wearable device developed by CSAIL researchers that can detect the emotional tone of a conversation. The researchers “are pushing the boundaries by training a computer to take account of such a wide range of factors in making judgments about emotion,” writes Akst.

Full story via The Wall Street Journal →

Science Friday

Science Friday reporter Ira Flatow and Motherboard reporter Daniel Oberhaus discuss a wearable device developed by CSAIL researchers that can detect the emotional tone of a conversation. Oberhaus explains that the researchers hope the device could one day be “applied with much finer emotional granularity, to the point where you can tell if the story was exciting or funny.”

Full story via Science Friday →

BBC

CSAIL researchers Tuka Al-Hanai and Mohammad Ghassemi speak to the BBC’s Gareth Mitchell about their system that can detect the tone of a conversation. Ghassemi explains that this research will provide “the first steps toward feedback,” for people who struggle to read social cues.

Full story via BBC →

Forbes

CSAIL researchers have developed a wearable AI system that allows users to detect the tone of a conversation in real-time, reports Janet Burns for Forbes. Using two algorithms to analyze data, the researchers were able to “classify each five-second chunk of conversation as positive, neutral, or negative,” explains Burns.

Full story via Forbes →

CBC News

Dan Misener of CBC News writes that a wearable device developed by MIT researchers detects the tone of conversation by listening to the interaction and measuring the physiological responses of the user. “All of that data is fed into a neural network that's been trained to identify certain cues,” explains Misener.

Full story via CBC News →

Wired

CSAIL researchers have developed a wearable system that can gauge the tone of a conversation based on a person’s speech patterns and vitals with 83 percent accuracy, writes Brian Barrett for Wired. The system could be useful for people with social anxiety or Asperger’s, Barrett explains.

Full story via Wired →

Boston Magazine

Hallie Smith writes for Boston Magazine that CSAIL researchers have developed a system that can help detect of the tone of a conversation. The system could be especially useful “for those who struggle with emotional and social cues, such as individuals with Asperger’s Syndrome,” Smith explains.

Full story via Boston Magazine →

MIT News | Massachusetts Institute of Technology - On Campus and Around the world

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

Wearable AI system can detect a conversation's tone

Press Contact:

Press Mentions

The Wall Street Journal

Science Friday

BBC

Forbes

CBC News

Wired

Boston Magazine

Related Topics

Related Articles

More MIT News

“Wait, we have the tech skills to build that”

Q&A: The secret sauce behind successful collegiate dining

Building reuse into the materials around us

Guided learning lets “untrainable” neural networks realize their potential

A new way to increase the capabilities of large language models

Digital innovations and cultural heritage in rural towns

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

Breadcrumb

Wearable AI system can detect a conversation's tone

Press Contact:

Share this news article on:

The Wall Street Journal

Science Friday

BBC

Forbes

CBC News

Wired

Boston Magazine

Related Links

Related Topics

Related Articles

More MIT News