Detecting oscillations of different frequencies in EEG is something we can easily accomplish with our naked eyes. It is also easy to spot different kind of muscle artefacts in the EEG. Trained neurologists can even detect initiating elliptical seizures or classify between different sleep stages. But other tasks, e.g., deciding which of many words a subject was listening to, are impossible without the mathematical methods from signal processing and machine learning. These algorithms allow extracting further information from the data, e.g., deciding which of some flashing stimuli on a computer screen was attended or measuring whether a user just perceived an error.
When processing this information in real-time, the user could can actively or passively control a computer or machine. This is called a Brain-Computer Interface (BCI). The most popular application is a speller allowing impaired users to slowly write a sentence just by using his/her brain activity. Traditionally, one needs to calibrate the system prior to its usage. During this period, the user performs a series of predefined tasks where the user's intention are known. This data is then used to train a machine learning classifier. My theoretical research is concerned with eliminating this calibration phase by transferring information about the classifier from other subjects and updating the classifier during the actual usage. This is a difficult challenge because the decoder needs to be able to learn from unlabelled data in that scenario (i.e. data where the user's intentions are unknown).
Learning from Label Proportions (LLP)
This is a simply, yet extremely powerful concept. Imagine a scenario where you do not have label information, but you know the proportional presence of the labels in subgroups of the data. You can think of two e-mail post-boxes, one which is a pure honey-pot and only contains spam, and one that is used regularly containing both spam and normal e-mails. If you can estimate the proportions of spam e-mails in the normal post-box, then you can apply LLP to draw conclusion how an average spam and not-spam e-mail look like. You could for instance find that certain words are only available in the spam group or others that are more typical for regular e-mails. Applying this concept is surprisingly easy: all you need to know is how to solve a linear system of equations.
In the context of BCIs based on event-related potentials (ERP), LLP allows the estimation of the target and non-target class means without label information (without knowing what the user tried to accomplish), but with the guarantee of convergence (finding out what he/she tried to accomplish given that we record long enough). A clever modification is needed to unlock the power of LLP: In the context of a visual speller, we enlarge the traditional spelling matrix by '#' symbols. These should always be ignored by the user and as such, are non-targets by definition.
Next, we split the highlighting events in two sequences (S1, S2). Events from S1 highlight only normal symbols while events from S2 also highlight '#' symbols. This leads to a higher target proportions in S1 than in S2. The exact target and non-target proportions are not important. It is only necessary to know that we can enforce certain (known!) proportions by construction of the two sequences.
This relationship between target/non-target class means and the S1 and S2 means can be expressed in a simple linear system of two equations. This system can then easily be solved for the target/non-target class means. Finally, one plugs in the estimation of the average of S1 and S2 (which do not require label information) in the solved system and gets an estimation for the target and non-target means. This whole procedure is summarized in the Figure.
Figure: Learning from label proportion (LLP) principle
The estimated means are then fed into a classifier similar to a linear discriminant analysis (LDA) classifier which is an extremely popular and well-performing linear classifier for BCIs. To compute a linear separating hyperplane with it, one only needs the class means and class-wise covariances. One can show that the class-wise covariance matrix can be replaced by the global covariance matrix Σ which can be computed without label information. The projection vector w is then finally computed as:.
With that vector, new unknown points x can be classified by assigning them as target if w · x > 0 and as non-targets otherwise.This is the basic principle of LLP. It gives an unsupervised machine learning method with guaranteed convergence. You can read more in our paper about LLP where we describe an online study with 13 subjects using this classifier. [ PDF ]
Mixing Unsupervised Model Estimators
In the previous section, you have seen how a simple paradigm modification can yield a completely unsupervised classifier. This classifier is relatively robust in the sense that it performs relatively well when only limited unlabeled data is available, but only slowly improves when more data is acquired.
Previously, Pieter-Jan Kindermans proposed to use an Expectation-Maximization algorithm for ERP decoding. This model makes use of special data structure in an ERP speller: Once you know the attended symbol, you know for each event whether it was a target or non-target. This so-called 'latent variable' substantially simplifies the unsupervised learning problem. Online studies and simulations have shown that this classifies has problems when limited data is available because it relies on a good initialization, but generally finds a very good decoder when more data is available.
In our approach led by Thibault Verhoeven, we propose to combine these two estimators (called MIX). The resulting classifier is astonishing: zero calibration, continuous learning from unlabeled data, guaranteed convergence and high decoding performance. This classifier really combines the strengths of the unsupervised and supervised world. Read more about the idea and how it performed in simulations in our Journal of Neural Engineering paper. [ Link ]The gold standard to verify BCI methods is to actually test them in an online application. With that, no overfitting (e.g. post-hoc selection of suitable hyperparameters of the machine learning model) is possible. In our latest work,we have not only reviewed other unsupervised learning and adaptation approaches, we have also tested LLP, EM and the MIX method in a spelling application with 12 healthy subjects. The results verify the astonishing performance obtained in simulations in Thibault's work.
Figure: Overview of correctly and incorrectly spelled characters for all 12 subjects. Blue squares denote incorrectly spelled characters while yellow squares indicate correctly spelled characters. The MIX method only needs around 3 minutes of unlabeled data to achieve almost perfect spelling accuracy. The Figure is adapted from the IEEE computational intelligence magazine publication (© IEEE).
After a few initial trials, the MIX method is as good as a supervised classifier. And even these initial trial can be corrected when re-applying an improved classifier from the later stage to reevaluate initial spelling attempts.
Figure: Comparison of the unsupervised MIX method with a supervised regularized LDA classifier. Both classifiers were trained on the first N-1 letters and tested on the Nth character. The thick lines depict the grand average over 12 subjects while the shaded area shows the standard deviation across subjects. The red dotted line shows the p-value of a Wilcoxon rank sum test comparing the supervised and unsupervised performance for character N. The Figure is adapted from the IEEE computational intelligence magazine publication (© IEEE).
Read more about it in our paper: 'Unsupervised Learning for Brain-Computer Interfaces Based on Event-Related Potentials: Review and Online Comparison' which appeared in IEEE Computational Intelligence Magazine. [ PDF ] [ Link ]
Aphasia Rehabilitation with BCI
BCIs cannot only be used for communication and control, but also for rehabilitation. Studies in the field of motor rehabilitation after stroke have already shown very promising results. In their approaches, the BCI directly decodes a movement intention or motor imagery by classifying task-specific brain signal features like event-related (de-)synchronization of motor-related rhythmic activity over sensory-motor areas. Upon classification of a motor attempt, the user receives immediate sensory feedback via functional electrical stimulation, a robotic/orthotic device or an avatar. Compared to traditional physiotherapy, this approach has advantages: a movement intention can be detected even if its execution is infeasible, and it allows to provide time-locked afferent feedback.
Motivated by this success, we explore the possibility to apply a BCI-based training protocol to language deficits (aphasia) after stroke. Similar to motor deficits, language deficits frequently occur after a brain stroke and it is estimated that around 20 % of the subjects suffer from language deficits in the chronic phase (>6 months after the stroke) even after speech and language therapy. During a target detection task based on auditory words, a BCI can provide information about task-related activation (e.g. about speech processing in the brain) and can feed this information back to the user in almost real-time. By constantly receiving feedback, patient can then learn to modulate their brain activity such that they successfully process language. Interestingly, training this task also significantly improves other language functions such as reading, writing and naming. The Figure shows a preliminary results for 8 chronic patients.
Figure taken from the abstract that we submitted to The Seventh International BCI Meeting 2018 at the Asilomar Conference Center in Pacific Grove, California, USA.