Monday, June 13, 2011

Paper Summary - Implicit Emotional Tagging of Multimedia Using EEG Signals and Brain Computer Interface

Notable Quote:
Our system analyzes the P300 evoked potential recorded from user’s brain to recognize what kind of emotion was dominant while he/she was watching a video clip.
Summary:
Attaching metadata to multimedia content (tagging) has become a common practice. Explicit tagging (such as labels on this blog) are manually assigned and are the most common form of tagging. Implicit tagging, however, allows for tags to be generated and assigned based on the user's observed behavior. Physiological signals, facial expressions, and acoustic sensors have been used to gauge emotional responses to stimulus in order to be used in implicit tagging. In this paper, the authors focus on the use of EEG signals to determine emotions and subsequently tag multimedia content.
Emotional responses selection. Taken from authors' paper.


The authors introduce the concept of "Emotional Taggability" (ET), which is a measure of how easily given content can be tagged. A video with low ET would elicit ambiguous or varying emotions which would be harder to recognize and classify. The authors used P300 evoked potentials to recognize dominant emotions in their study participants. In the experiment, a total of 24 clips from 6 emotional categories (joy, sadness, surprise, disgust, fear, and anger) were shown to participants. After watching a clip, participants were then shown a screen containing 6 images, one for each emotional category. Images were highlighted pseudo-randomly in order to measure the P300 evoked potential of each image and thus its associated emotion. Eight subjects were used to train the classifier by actively focusing on a desired state and measuring the number of times the subject chose it and the computer determined that they chose it. Such explicit classification was then used to gauge the implicit states of four actual study participants.

Taken from authors' paper.
The authors bring up the ambiguous nature of some of the chosen videos to explain the lower annotation results. Videos with low ET resulted in mixed emotional responses which were difficult to classify. In order to determine ET for the videos and prove this point, 18 different participants were asked to watch the chosen videos and rate them, on a scale of 0 to 10, for each emotional category. When a video received high marks for more than one category it was said to have a lower ET value. The authors found a correlation between higher ET values and correct classifications by their implicit tagging system.


Discussion:
I like the idea of being able to implicitly tag content based on emotional responses. The concept of Emotional Taggability, though I barely touched on it in the summary, is also really interesting. To me, ET has the benefit of marking content on a broader scale before inciting implicit emotional tags. The issue that arises with ET is that at some point, say with a very large corpus, it would be just as much work to explicitly tag items as it would be to classify them as low/high ET and then go back and allow for implicit tagging. I want to see an extension of this project that uses the tags in some way! And did participants see any benefit in implicit tagging, or was it just for the researchers' benefits alone?

Outlook:
You want to watch a movie on Netflix, but cannot decide which one. Somehow Netflix is able to read your emotional state and, given the ET ratings on various videos that you and similar customers have input, brings up a list of videos that match your mood. It could go even farther and use P300 evoked potentials with key scenes from the movie that are tagged as having high ET ratings to determine which video you are most interested in watching at that time, even if you aren't consciously aware of it. I could see this scenario being implemented within a few years even! The biggest issue would be a reliable, unobtrusive way to measure emotional states. As new consumer-level headsets become available, and with Netflix's interest in prediction algorithms (including their $1 million contests), who is to say that something like this isn't the future of personalized entertainment in the coming years? But if Netflix does do something like this, I better get some credit for it! $1 million will do quite nicely...

Full Reference:
Ashkan Yazdani, Jong-Seok Lee, and Touradj Ebrahimi. 2009. Implicit emotional tagging of multimedia using EEG signals and brain computer interface. In Proceedings of the first SIGMM workshop on Social media (WSM '09). ACM, New York, NY, USA, 81-88. DOI=10.1145/1631144.1631160 http://doi.acm.org/10.1145/1631144.1631160

Thursday, May 26, 2011

Paper Summary - A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Summary:
This summary also functions as a very rough and basic introduction to HMMs for myself. This accounts for the different format and subsequent length of this entry.


Overview
In this paper, Rabiner provides a tutorial on hidden Markov models and how they may be applied to the area of speech recognition. A hidden Markov model (HMM) is a stochastic signal model, meaning that the signals can be well characterized as parametric random processes. Rabiner provides a formal look at discrete Markov processes with the example of predicting the weather for a week. The example is roughly as follows:

Taken from author's paper.
Any day it may be raining, cloudy, or sunny (so 3 possible states). For a given day t, the matrix of state transition probabilities is as shown to the right. Given that it is sunny (state 3) on the first day, we define an observation sequence O as O ={S3, S3, S3, S1, S1, S3, S2, S3}, and wish to determine its probability. As Rabiner states in his paper, the following expression and evaluation shows the probability for this observation sequence:
Taken from author's paper. Determining the probability for the weather.

In this example the states are observable, physical events. Hidden processes, however, often exist in problems that by definition lack observable processes. In such instances the result of a process can be known but not the process itself. Therefore the problem becomes finding an accurate model for the observed sequence of results. This is where building an HMM to explain the observed sequence comes into play. See Rabiner's paper for two initial examples that can be modeled with an HMM.

Three Basic Problems for HMMs
There are three basic problems that must be solved for an HMM to be of use.

Problem 1 - Given a sequence of observations and a model, how do you compute the probability that the model produced the observed sequence? An efficient procedure for solving this problem is known as the forward-backward procedure. This procedure is based on the lattice structure of states. Given N states, each change from one state to another will again result in one of these N states. Therefore, calculating the probability of a partial observation sequence up to a given time can be somewhat reduced to calculating the probabilities along the connections between states for a given time. (Note: You are much better off reading up on the procedure than trusting my summary).

Problem 2 - Given a sequence of observations and a model, how do you get a state sequence which is optimal in a meaningful way? In most instances, you may be interested in finding the single best state sequence. The Viterbi Algorithm uses dynamic programming to do so. Again, a lattice structure effectively illustrates the process. (Note: Read up on the process because I'm not rehashing it here).

Problem 3 - How do you optimize model parameters to best describe a given sequence of observations? Rabiner states that this is the most difficult problem of HMMs. He focuses on the Baum-Welch method for finding a locally maximized probability for an observation sequence with a chosen model. In essence, it can be used to find unknown parameters. (Note: You should read up on that as well).

Types of HMMs
Taken from author's paper. Also, not Figure 7.
The basic problem explanations and walkthroughs that Rabiner provides are based on a fully connected HMM. Also known as ergodic, such HMMs allow for any state to reach any other in finite number of steps. There are, however, different types of HMMs that may be encountered. Rabiner discusses the three HMMs as shown to the right.

With Rabiner's focus on speech recognition, he now brings up the issue of monitoring continuous signals as opposed to discrete symbols that are measured at set intervals or times. With continuous observations, the probability density function must be restricted to insure that parameters can be reestimated consistently. (Note: This process is best left for your own reading).


Discussion:
Rabiner's look at continuous signals in speech recognition is relevant for BCI research. When sensor data is being generated continuously (and from 14 points if you're using the EPOC), something has to be done to make sense of the information in polynomial time. In particular, a researcher may want to know what signal sequence led up to an observed state, or how best to classify a set of signals from various sensors. HMM-based classification has previously been researched and you can find different publications on the subject by performing a simple online search (a few examples being found here and here). As for Rabiner's paper itself, I stopped summarizing around page 12 of 30 because I needed to read it through a couple times before I understood anything beyond that point. A mathematically rigorous paper is the foundation of solid research projects in many fields, but give me an applications paper any day. As is evident from my summary, I identified more with Rabiner's examples than with his equations, and as such I basically avoided their reproduction here in this entry. Also, I found that the Wikipedia entry on HMMs was beneficial for getting more examples and quickly examining the key problems as stated by Rabiner.

Full Reference:
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE (1989).

Monday, May 23, 2011

Paper Summary - The Emotional Economy for the Augmented Human

Notable Quote:
...new Commercial-On-The-Shelf (COTS) Brain-Computer Interfaces (BCI) can be used to provide real-time happiness feedback as people live their life.
Summary:
This paper by Jean-Marc Seigneur investigates the obtainment and use of real-time happiness measures. Previous and current methods for evaluating happiness are done a posteriori, which allows room for subjects to change their opinions and be influenced by current emotions while being expected to relive the past events in question. Using a BCI the author proposes detecting happiness in the moment, and at a more basic level. Seigneur describes this as the Emotional Economy (EE) Model. Given a user and a service, the model looks at what emotions reach a threshold to be measured (the example here being happiness). This information can then be used to make decisions or catalog emotions at desired checkpoints. The proof-of-concepts afford measures of engagement, frustration, meditation, instantaneous excitement, long term excitement, and happiness. Note that these measures are easily accommodated with the Emotiv EPOC headset used.

The Facebook scenario. Taken from the author's paper.
As an example, the author created two use-scenarios. In the first, a user watches a video via Facebook, and if Happiness is detected, then the video is automatically liked.The second scenario incorporates location-based computing. The wearer is given a backpack containing a portable computer and a GPS unit. While moving along an outdoor tour, the GPS and emotion readings can be synched via their timestamps to determine at which geographic point the wearer felt different emotions, and thus if they enjoyed themselves and when. As stated by the author, this scenario could allow for automatic tourist reviews and, when combined with written testimonials, lead credence to factual positive and negative reviews overall.

Discussion:
I highly recommend this short paper to those interested in ubiquitous computing and/or emotional measures. The thing that really sticks out is that as BCIs become more functional and effective at a higher lever they can become just another input device or sensor, albeit one with a wider gambit of potential uses. In essence, the author is stating this in his Facebook scenario, the code needed was for interacting with Facebook and not for manipulating the headset sensors themselves. This work is also open to quick extensions. For instance, measuring frustration could be an indicator of not liking the video in the Facebook scenario, which could cause it to be removed the viewer's news feed. The second scenario opens up a wider range of applications via the incorporation of positioning and timestamps. Such a scenario could be extended for testing amusement parks and rides, and movies or commercials without the need for the GPS data. In fact, a similar article can be found here on engadget.

Outlook:
Very engaging. For some reason the extension I thought of was relationship evaluation. Are you really happy with the person you're with? Who makes you happier? That's all an online dating site needs are brain signals to further prove connections between paying customers romantic matches. Or therapists for that matter! Pop on a headset and really see what you feel given stimuli. With the potential perfection of emotional classification in the future, who knows what can be learned about others. This raises an ethical concern as well. If I CAN see how everything makes you feel, SHOULD I? Could BCIs be used to weed out unfit Soldiers or track down criminals? Could they be used to detect biases, fallacies, and overall ethnocentric beliefs? To me that is very heavy stuff worth much further consideration.

Full Reference:
Jean-Marc Seigneur. 2011. The emotional economy for the augmented human. In Proceedings of the 2nd Augmented Human International Conference (AH '11). ACM, New York, NY, USA, , Article 24 , 4 pages. DOI=10.1145/1959826.1959850 http://doi.acm.org/10.1145/1959826.1959850

Paper Summary - NeuroPhone: brain-mobile phone interface using a wireless EEG headset

Notable Quote:
...users on-the-go can simply “think” their way through all of their mobile applications.

Summary:
NeuroPhone is a system which essentially allows people to dial contacts by thinking about them. Campbell, et al. use the Emotiv EPOC to read EEG signals and pass them to an iPhone. The iPhone then runs a lightweight classifier that can distinguish the desired signals from noise. More specifically, the authors use P300 signals (A positive peak associated with a desired individual image within a set) and physical winks. The process is as follows:
  1. A set of contact photos appear on the screen
  2. Each photo is highlighted in turn 
  3. The user concentrates on the photo for the contact they wish to dial
  4. When said photo is highlighted, a P300 signal is generated (or the user winks)
  5. The iPhone gets the positive acknowledgement from the wearer and dials that contact
    The contact selection process. Graciously taken from the authors' paper.
The authors make note of design considerations based on the implementation of NeuroPhone. First, noise is an issue with both the EPOC and EEG signals. To reduce noise, they propose averaging data over many trials, thus increasing the signal to noise ratio at the expense of increased delay time. They also use a filter to remove any noise outside of the desired P300 frequency range. Finally, designing a mobile-based classifier requires efficient design choices. In the instance of NeuroPhone only a subset f the EEG channels are passed to the iPhone for classification, and there is no continuous streaming of data to the device (think battery drain).

The authors conducted an initial user study for both the 'wink' and 'think' selection modes with 3 subjects. They found that their classifier for winks worked best on relaxed, seated subjects. Actions that led to muscle contraction and distracted users (music was used in their test) led to significantly lower accuracy measures. The authors also showed that accuracy increases as the data accumulation time increases.

A video overview of NeuroPhone is also available.

Discussion:
When reading the evaluation I couldn't help but wonder if the users liked the application itself. Besides its unquestionable novelty, does it serve a function? Neither I nor the authors claim that NeuroPhone was designed to solve the specific problem of calling someone, but I believe the paper could have done a better job stressing the function of NeuroPhone in the greater realm of mobile applications. This project serves as an excellent example of how signal processing and BCI research can be meshed into HCI and mobile and ubiquitous computing. I have read previous papers on P300-spellers (typing letters by selecting them from a visible grid) and was glad to see the concept extended in a way that now seems completely obvious. I also enjoy the fact that they used the Emotiv EPOC because it is obviously my chosen headset for my fledgling research. The authors did a great job of explaining the benefits of using cheap(er) headsets like the EPOC and framing their use within the context of mobile computing. Overall the focus of the paper seemed to be torn between the success of the mobile classifier and the contact dialer itself. Both points came across, but I read sections out of order to better follow each thread of their contribution. Great stuff.

Outlook:
Beyond the project itself, I really connected with section 2 of the paper. The future outlook posed by the authors literally made me stop and consider the implications of BCI research. How long until we have to worry about people intercepting our 'thoughts' or emotional maps, or forging them in order to interface with technologies? I feel like Tom Clancy wrote something about this already... Regardless, the authors pose an excellent point. Emotion-driven interfaces are on their way, and we have much to consider.

Full Reference:
Andrew Campbell, Tanzeem Choudhury, Shaohan Hu, Hong Lu, Matthew K. Mukerjee, Mashfiqui Rabbi, and Rajeev D.S. Raizada. 2010. NeuroPhone: brain-mobile phone interface using a wireless EEG headset. In Proceedings of the second ACM SIGCOMM workshop on Networking, systems, and applications on mobile handhelds (MobiHeld '10). ACM, New York, NY, USA, 3-8. DOI=10.1145/1851322.1851326 http://doi.acm.org/10.1145/1851322.1851326

Tuesday, May 17, 2011

The Emotiv EPOC - Introduction

Overview:
I am using the Emotiv EPOC neuroheadset with the Education Edition SDK. The headset and this SDK run at $2500 together. I recommend this version if you are conducting research that is a) non-commercial and (optionally) b) open to multiple collaborators within your department. And you will also need to be at some form of academic or education institute. Emotiv offers 6 different SDKs to choose from, including the FREE SDKLite. Read up and choose the one that is right for you or your organization.

Why the Emotiv EPOC?
I chose to use the EPOC for my research for a few different reasons. First, it has 14 electrodes and can also measure head rotation. The more you can measure, the more you can do with the signals (well hopefully). The EPOC can be trained to detect conscious thoughts, emotions, and facial expressions. The provided Control Panel is great for viewing sensor contacts, doing some basic training, and just getting acquainted with the headset. The first time you put on the headset and hook it up to the computer, you will feel awesome. Or at least I felt awesome.
The white outer box is essentially a layer of paper.
Electroencephalographic (EEG) signals are fluctuations of electrical potential along the scalp created by neurons in the brain. EEG signals can thus be measured outside the brain itself in a non-invasive way via wearable headsets.The EPOC is one such headset. A very brief comparison of current consumer headsets is available on Wikipedia. More information on BCIs and how EEG fits within the field can also be found on Wikipedia.

Contents unpacked.

Out of the box
The EPOC comes in a flimsy box that in no way made me feel comfortable with my choice. Upon opening, however, you will find that everything is securely packed. My box contained a headset, a small bottle of solution, a case containing the 16 sensors, a charger, and the Bluetooth connector. Additional sensor packs and headsets can be purchased from the Emotiv Store.

Obtaining the SDK
The software that is available through the Education Edition DOES NOT come in the box. It is available for download, linked to the purchaser's email address, from the Emotiv website. This can be a slight pain in the ass to obtain, but I can say from experience that the Customer Care personnel are both patient and timely in their responses. In my case it took 5 days of correspondence (and a lot of confusion) before I was given access to the Education Edition SDK. If you have a departmental purchasing officer, please do the following to avoid any issues:
  • Don't Panic.
  • Have the purchaser forward you all emails received from Emotiv about the purchase made. This includes order number (which is actually order ID) and confirmations.
  • When you register on the Emotiv site, use your departmental email address. They will check that the emails between purchaser and researcher match (i.e., department.school.edu). Again, the SDK is licensed to your department and not your entire university.
  • Do not take everything they say literally. For example, if they ask for the order number, it might not be the thing labeled 'order number' Or if they ask for your school's email ID, they really want your department's email ID that will match the purchaser.
The person who helped me was awesome, and I naively assume that everyone in Emotiv's employ is equally awesome.

First Things First!
Make sure that your headset turns on before you do anything. If it is not holding a slight charge, then simply plug it in and give it a few minutes. Red light is charging, green light is charged. If you get a blue light when you flip the switch (located on the bottom rear of the headband), then you are set. Once you know the headset turns on, read the digital manual and begin finding that perfectly frustrating level of solution needed for proper conductivity.

Wednesday, April 13, 2011

The effects of alcohol as measured with EEG signals

Using the Emotiv EPOC, I propose to measure the emotions and level of focus of participants as they consume alcohol. The Affectiv Suite will be used to gauge the wearer's level of Engagement/Boredom, Frustration, Meditation, Instantaneous Excitement, and Long Term Excitement. The Cognitiv Suite's suites initial training program will be used for testing levels of focus.

Purpose/Questions:
Essentially a variation on the Ballmer Peak. At what BAC do people have the highest/lowest focus? Highest/lowest emotional levels? Are there differences between the different 'types of drunks' for these measures? Can EEG signals be used to measure the effects of alcohol? Finally, and possibly most importantly, will the IRB give me approval to get people drunk in a controlled environment in the name of science?

User Study Format:
  • Get participants who are at least 21 years of age (for this to be official research).
    • Survey to determine exposure to BCIs, drinking tendencies, what kind of 'drunk' they are, gender, age, etc.
  • Perform initial training set on participants. Number of training sets per participant to be determined.
    • Record neutral states for each participant.
    • Have each participant train one of the movements (i.e., lift the cube) in the Emotiv Control Panel training program.
    • Measure peak level of focus behind movement.
    • Using Affectiv Suite, measure emotional levels.
  • Have participants drink. At set consumption intervals, have participants try and perform their trained movement.
    • Set number of tries to perform movement.
    • Measure peak level of focus behind movement.
    • Using Affectiv Suite, measure emotional levels.
    • Bonus: Create secondary, tertiary, etc. profiles for each participant and record new neutral states for later comparison.
  • Repeat until everyone is legally drunk (.08 BAC in Texas). Same measurements taken at each step.
  • Ask participants at which point they felt that were most focused/emotional.
Analysis:
  • Comparisons of neutral states at varying BAC levels.
  • Comparisons of focus levels for task.
  • Comparisons of emotional readings.
  • Correlation between 'type of drunk' and results?
  • Accuracy of a participant's perceived moments of highest focus/emotion with the recorded information.
 
Note: This proposed study, and any others that I post, are my own ideas unless otherwise stated. Please do not steal them without giving the proper credit!

Monday, April 11, 2011

Think about it

There should be a way to monitor the emotional state of other people for signs of depression, uncontrolled anger, suicidal indicators, etc., using BCIs. I know that the military has identified issues with Soldiers' mental health and has begun to incorporate new total fitness programs that are geared towards eliminating the stigma that showing emotion or needed medication is a weakness. Could not a BCI be used to check how a person feels below the surface, beyond their responses? Could we peer into the mind of the wearer, and see what they feel?