EEGDM: EEG Representation Learning via Generative Diffusion Model

Abstract

While electroencephalogram (EEG) has been a crucial tool for monitoring the brain and diagnosing neurological disorders (e.g., epilepsy), learning meaningful representations from raw EEG signals remains challenging due to limited annotations and high signal variability. Recently, EEG foundation models (FMs) have shown promising potential by adopting transformer architectures and self-supervised pre-training methods from large language models (e.g., masked prediction) to learn representations from diverse EEG data, followed by fine-tuning on specific EEG tasks. Nonetheless, these large models often incurred high computational costs during both training and inference, with only marginal performance improvements. In this work, we proposed EEG representation learning framework building upon Generative Diffusion Model (EEGDM). Specifically, we developed structured state-space model for diffusion pretraining (SSMDP) to better capture the temporal dynamics of EEG signals and trained the architecture using a Denoising Diffusion Probabilistic Model. The resulting latent EEG representations were then used for downstream classification tasks via our proposed latent fusion transformer (LFT). To evaluate our method, we used the multi-event Temple University EEG Event Corpus and compared EEGDM with current state-of-the-art approaches, including EEG FMs. Empirical results showed that our method outperformed existing methods while being approximately 19× more lightweight. These findings suggested that EEGDM offered a promising alternative to current FMs.

What is SSMDP and how it work?

This component is responsible for learning robust representations from raw EEG signals. The method draws inspiration from Generative Diffusion Models, which are a class of self-supervised learning models. Instead of the typical masking-based methods, SSMDP learns to reverse a process of noise injection.

The core of SSMDP is a Structured State-Space Model (SSM) architecture. This architecture is particularly well-suited for capturing the temporal dynamics of EEG signals. During pretraining, the model is tasked with progressively "denoising" the EEG signal to restore the original data. The resulting latent activities and representations from this process are then used for downstream tasks.

How to fuse latent representations for classification tasks?

LFT is a transformer-based architecture designed to take the learned latent representations from the SSMDP and fuse them for classification tasks. Because the latent representations are high-dimensional, they are first pooled to reduce computational complexity.

The LFT has a "latent fusion module" that uses a decoder-only transformer to aggregate the information from different EEG channels and layers. This module uses "fusion tokens" to create a context-aware feature for each time window of the signal. The "classification module" then takes these fused representations and processes them through an encoder-only transformer, finally using a linear classification head to make predictions. This fine-tuning phase is trained in a supervised manner on a specific task.

Experimental evidences

In this section, we present the main results of the proposed EEGDM framework in downstream tasks. EEGDM achieved new state-of-the-art results in multi-event classification, highlighting the effectiveness of its diffusion-based representation learning. These findings demonstrate that EEGDM offers a promising and computationally efficient alternative, outperforming existing methods while being approximately 19x more lightweight.