Detecting oscillations of different frequencies in EEG is something we can easily accomplish with our naked eyes. It is also easy to spot different kind of muscle artefacts in the EEG. Trained neurologists can even detect initiating elliptical seizures or classify between different sleep stages. But other tasks, e.g., deciding which of many words a subject was listening to, are impossible without the mathematical methods from signal processing and machine learning. These algorithms allow extracting further information from the data, e.g., deciding which of some flashing stimuli on a computer screen was attended or measuring whether a user just perceived an error.
When processing this information in real-time, the user could can actively or passively control a computer or machine. This is called a Brain-Computer Interface (BCI). The most popular application is a speller allowing impaired users to slowly write a sentence just by using his/her brain activity. Traditionally, one needs to calibrate the system prior to its usage. During this period, the user performs a series of predefined tasks where the user's intention are known. This data is then used to train a machine learning classifier. My theoretical research is concerned with eliminating this calibration phase by transferring information about the classifier from other subjects and updating the classifier during the actual usage. This is a difficult challenge because the decoder needs to be able to learn from unlabelled data in that scenario (i.e. data where the user's intentions are unknown).
Learning from Label Proportions (LLP)
This is a simply, yet extremely powerful concept. Imagine a scenario where you do not have label information, but you know the proportional presence of the labels in subgroups of the data. You can think of two e-mail post-boxes, one which is a pure honey-pot and only contains spam, and one that is used regularly containing both spam and normal e-mails. If you can estimate the proportions of spam e-mails in the normal post-box, then you can apply LLP to draw conclusion how an average spam and not-spam e-mail look like. You could for instance find that certain words are only available in the spam group or others that are more typical for regular e-mails. Applying this concept is surprisingly easy: all you need to know is how to solve a linear system of equations.
In the context of BCIs based on event-related potentials (ERP), LLP allows the estimation of the target and non-target class means without label information (without knowing what the user tried to accomplish), but with the guarantee of convergence (finding out what he/she tried to accomplish given that we record long enough). A clever modification is needed to unlock the power of LLP: In the context of a visual speller, we enlarge the traditional spelling matrix by '#' symbols. These should always be ignored by the user and as such, are non-targets by definition.
Next, we split the highlighting events in two sequences (S1, S2). Events from S1 highlight only normal symbols while events from S2 also highlight '#' symbols. This leads to a higher target proportions in S1 than in S2. The exact target and non-target proportions are not important. It is only necessary to know that we can enforce certain (known!) proportions by construction of the two sequences.
This relationship between target/non-target class means and the S1 and S2 means can be expressed in a simple linear system of two equations. This system can then easily be solved for the target/non-target class means. Finally, one plugs in the estimation of the average of S1 and S2 (which do not require label information) in the solved system and gets an estimation for the target and non-target means. This whole procedure is summarized in the Figure.
The estimated means are then fed into a classifier similar to a linear discriminant analysis (LDA) classifier which is an extremely popular and well-performing linear classifier for BCIs. To compute a linear separating hyperplane with it, one only needs the class means and class-wise covariances. One can show that the class-wise covariance matrix can be replaced by the global covariance matrix Σ which can be computed without label information. The projection vector w is then finally computed as:.
With that vector, new unknown points x can be classified by assigning them as target if w · x > 0 and as non-targets otherwise.This is the basic principle of LLP. It gives an unsupervised machine learning method with guaranteed convergence. You can read more in our paper about LLP where we describe an online study with 13 subjects using this classifier. [ PDF ]
Mixing Unsupervised Model Estimators
In the previous section, you have seen how a simple paradigm modification can yield a completely unsupervised classifier. This classifier is relatively robust in the sense that it performs relatively well when only limited unlabeled data is available, but only slowly improves when more data is acquired.
Previously, Pieter-Jan Kindermans proposed to use an Expectation-Maximization algorithm for ERP decoding. This model makes use of special data structure in an ERP speller: Once you know the attended symbol, you know for each event whether it was a target or non-target. This so-called 'latent variable' substantially simplifies the unsupervised learning problem. Online studies and simulations have shown that this classifies has problems when limited data is available because it relies on a good initialization, but generally finds a very good decoder when more data is available.
In our recent work led by Thibault Verhoeven, we propose to combine these two estimators. The resulting classifier is astonishing: zero calibration, continuous learning from unlabeled data, guaranteed convergence and high decoding performance. This classifier really combines the strengths of the unsupervised and supervised world. Read more about it in our recent paper. [ Link ]