Monday, June 13, 2011

Paper Summary - Implicit Emotional Tagging of Multimedia Using EEG Signals and Brain Computer Interface

Notable Quote:
Our system analyzes the P300 evoked potential recorded from user’s brain to recognize what kind of emotion was dominant while he/she was watching a video clip.
Summary:
Attaching metadata to multimedia content (tagging) has become a common practice. Explicit tagging (such as labels on this blog) are manually assigned and are the most common form of tagging. Implicit tagging, however, allows for tags to be generated and assigned based on the user's observed behavior. Physiological signals, facial expressions, and acoustic sensors have been used to gauge emotional responses to stimulus in order to be used in implicit tagging. In this paper, the authors focus on the use of EEG signals to determine emotions and subsequently tag multimedia content.
Emotional responses selection. Taken from authors' paper.


The authors introduce the concept of "Emotional Taggability" (ET), which is a measure of how easily given content can be tagged. A video with low ET would elicit ambiguous or varying emotions which would be harder to recognize and classify. The authors used P300 evoked potentials to recognize dominant emotions in their study participants. In the experiment, a total of 24 clips from 6 emotional categories (joy, sadness, surprise, disgust, fear, and anger) were shown to participants. After watching a clip, participants were then shown a screen containing 6 images, one for each emotional category. Images were highlighted pseudo-randomly in order to measure the P300 evoked potential of each image and thus its associated emotion. Eight subjects were used to train the classifier by actively focusing on a desired state and measuring the number of times the subject chose it and the computer determined that they chose it. Such explicit classification was then used to gauge the implicit states of four actual study participants.

Taken from authors' paper.
The authors bring up the ambiguous nature of some of the chosen videos to explain the lower annotation results. Videos with low ET resulted in mixed emotional responses which were difficult to classify. In order to determine ET for the videos and prove this point, 18 different participants were asked to watch the chosen videos and rate them, on a scale of 0 to 10, for each emotional category. When a video received high marks for more than one category it was said to have a lower ET value. The authors found a correlation between higher ET values and correct classifications by their implicit tagging system.


Discussion:
I like the idea of being able to implicitly tag content based on emotional responses. The concept of Emotional Taggability, though I barely touched on it in the summary, is also really interesting. To me, ET has the benefit of marking content on a broader scale before inciting implicit emotional tags. The issue that arises with ET is that at some point, say with a very large corpus, it would be just as much work to explicitly tag items as it would be to classify them as low/high ET and then go back and allow for implicit tagging. I want to see an extension of this project that uses the tags in some way! And did participants see any benefit in implicit tagging, or was it just for the researchers' benefits alone?

Outlook:
You want to watch a movie on Netflix, but cannot decide which one. Somehow Netflix is able to read your emotional state and, given the ET ratings on various videos that you and similar customers have input, brings up a list of videos that match your mood. It could go even farther and use P300 evoked potentials with key scenes from the movie that are tagged as having high ET ratings to determine which video you are most interested in watching at that time, even if you aren't consciously aware of it. I could see this scenario being implemented within a few years even! The biggest issue would be a reliable, unobtrusive way to measure emotional states. As new consumer-level headsets become available, and with Netflix's interest in prediction algorithms (including their $1 million contests), who is to say that something like this isn't the future of personalized entertainment in the coming years? But if Netflix does do something like this, I better get some credit for it! $1 million will do quite nicely...

Full Reference:
Ashkan Yazdani, Jong-Seok Lee, and Touradj Ebrahimi. 2009. Implicit emotional tagging of multimedia using EEG signals and brain computer interface. In Proceedings of the first SIGMM workshop on Social media (WSM '09). ACM, New York, NY, USA, 81-88. DOI=10.1145/1631144.1631160 http://doi.acm.org/10.1145/1631144.1631160