ICA Decomposition & Classification

Independent Component Analysis is the most powerful artifact rejection tool in the EEG signal processing toolkit—and the most consequential. When ICA works well, it separates a mixed scalp signal into its constituent sources: brain activity, eye movements, muscle tension, cardiac rhythm, line noise. When it works poorly, it creates components that are mixtures of multiple sources, leading to either artifact bleeding into the clean signal or brain activity being discarded as artifact.

The Coherence Workstation uses ICA as a core preprocessing step, with automated classification via ICLabel and an interactive clinician review stage for borderline components. Every decision in this section—algorithm choice, threshold values, the two-stage filtering strategy—is designed to maximize decomposition quality on 19-channel clinical recordings.

Why Picard

The pipeline uses the Picard algorithm for ICA decomposition, rather than the more traditional Infomax (runica) used by EEGLAB:

preprocessing:
  ica:
    method: picard
    extended: true
    max_iter: 500
    random_state: 42

Picard is a preconditioned version of Infomax that converges faster and more reliably—typically reaching the same solution quality in a fraction of the iterations. It uses L-BFGS optimization with Hessian preconditioning, which handles the curvature of the objective function more efficiently than the natural gradient descent used by standard Infomax.

Extended mode (extended: true) enables separation of both sub-Gaussian and super-Gaussian sources. Standard ICA assumes all sources are super-Gaussian (peaky distributions), but some EEG sources—particularly line noise and some muscle artifacts—have sub-Gaussian (flat) distributions. Extended mode handles both, which improves separation quality at a modest computational cost.

Reproducibility is guaranteed by the fixed random seed (random_state: 42). ICA involves random initialization, and different seeds can produce different component orderings (though the underlying decomposition should be equivalent). A fixed seed ensures that reprocessing the same data produces the same components in the same order.

The practical difference between Picard and Infomax is speed, not quality. Both algorithms find the same independent components (they’re solving the same optimization problem); Picard just gets there faster. On a typical 19-channel, 5-minute resting recording, Picard completes in seconds where Infomax might take minutes. This matters in a clinical workflow where wait times affect usability.

How Many Components

preprocessing:
  ica:
    n_components: 0.999

The number of ICA components is set by a variance threshold rather than a fixed count. The value 0.999 means “use as many components as needed to explain 99.9% of the data variance.” For a typical 19-channel recording, this produces 13–17 components—fewer than the total channel count, which is correct. The remaining 0.1% of variance is dominated by noise that doesn’t decompose into meaningful independent sources.

Why not use all 19 components? Because the last few components in a 19-channel decomposition typically capture sensor noise, rank-deficient dimensions created by re-referencing, and numerical artifacts. Including them degrades the decomposition by forcing the algorithm to find “independent” sources in what is essentially noise. The variance-based threshold automatically adapts to the effective dimensionality of the data.

The Two-Stage Filter Strategy

Here is the single most important technical decision in the ICA stage. The pipeline filters the data at 0.5 Hz for analysis (to preserve delta) but fits ICA on a copy filtered at 1.0 Hz:

preprocessing:
  ica:
    two_stage_filter: true
    ica_highpass: 1.0

Why? ICA decomposition is an optimization problem that works by finding statistical independence between sources. Low-frequency signals—particularly those below 1 Hz—have high autocorrelation and large amplitude swings that dominate the optimization landscape. They make it harder for the algorithm to identify the subtler statistical patterns that distinguish brain activity from eye movements from muscle tension.

The solution, recommended by both MNE-Python and EEGLAB developers, is to fit ICA on a high-pass filtered copy of the data (1.0 Hz), then apply the resulting unmixing matrix to the original lower-filtered data (0.5 Hz). The unmixing matrix describes the spatial structure of the sources—which electrodes contribute to which component. This spatial structure doesn’t change with the high-pass cutoff. What changes is the algorithm’s ability to find that structure, which is better with the higher cutoff.

The result: we get the convergence benefits of a 1.0 Hz high-pass for ICA fitting, and the delta-preserving benefits of a 0.5 Hz high-pass for all downstream analysis. The two-stage approach is the reason the pipeline can use an aggressive low cutoff without sacrificing ICA quality.

ICLabel: Automated Component Classification

After ICA decomposition, every component must be classified: is it brain, or is it artifact? The pipeline uses ICLabel, a deep-learning classifier trained on over 6 million manually labeled ICA components from the EEGLAB ecosystem (Pion-Tonachini et al., 2019).

ICLabel assigns each component a probability distribution across seven categories: Brain, Muscle, Eye, Heart, Line Noise, Channel Noise, and Other. The pipeline uses these probabilities to make automated keep/reject decisions, with a clinician review stage for uncertain cases:

preprocessing:
  iclabel:
    brain_threshold: 0.80
    review_threshold: 0.50

The Decision Logic

Auto-keep (brain probability ≥ 0.80): Components where ICLabel is confident the source is brain activity are kept automatically. The 0.80 threshold is conservative by design—the literature standard is 0.50 (keep if brain is the highest-probability class). We require higher confidence because 19-channel recordings produce fewer components than high-density arrays, and each component captures a larger proportion of the total signal. A false positive—keeping a contaminated component—has a proportionally larger impact on the clean signal.

Auto-reject (non-brain dominant class ≥ 0.50): Components where a non-brain class (muscle, eye, heart, line, channel noise) has probability ≥ 0.50 are automatically rejected. This means ICLabel is more confident that the component is artifact than brain, and the dominant artifact class exceeds the confidence threshold.

Flag for review (everything else): Components that fall between the two thresholds—brain probability between 0.50 and 0.80, or non-brain probability below 0.50—are flagged for clinician review. These are the borderline cases where automated classification isn’t confident enough in either direction.

Why 0.80 Instead of 0.50

The choice of 0.80 as the brain threshold deserves explicit justification, because it’s more conservative than what the literature typically recommends.

With 256 channels, ICLabel’s false positive rate is diluted across many components—even if a few components are misclassified, the impact on the reconstructed signal is small because each component represents a tiny fraction of the total. With 19 channels, a single misclassified component might represent 6–8% of the total signal variance. Keeping a muscle component as brain at 19 channels is qualitatively different from keeping one at 256 channels.

The conservative threshold means more components end up in the review queue. This is intentional. The interactive data preparation step presents flagged components with their topographic maps, spectral profiles, and ICLabel probabilities, allowing the clinician to make informed decisions about borderline cases. The pipeline assists classification; it doesn’t replace clinical judgment.

What ICA Removes

When a component is rejected (either automatically or by clinician review), its contribution is subtracted from every channel at every time point. The reconstructed signal is the original minus the weighted sum of all rejected components. This is a subtractive process—no data is deleted or zeroed out. The time series maintains its continuity, with the artifact signals mathematically removed.

Common artifact types and how they appear as ICA components:

Eye blinks produce components with strong frontal topography (highest weights at Fp1/Fp2) and large, regular amplitude deflections. Their spectra are dominated by delta-frequency power. ICLabel typically classifies these with >0.95 eye probability.

Lateral eye movements show a left-right frontal dipolar pattern (positive at F7, negative at F8, or vice versa) with sustained rectangular deflections corresponding to saccades.

Muscle artifact produces components with peripheral topography (highest weights at temporal or occipital edges) and broadband high-frequency power (>20 Hz) without clear spectral peaks. Multiple muscle components may be present simultaneously, each corresponding to a different muscle group.

Cardiac artifact appears as a component with a regular, QRS-like waveform at the heart rate frequency. Its topography is typically broad and posterior. This component is often small-amplitude and may not significantly affect EEG analysis even if retained.

Line noise produces a component with nearly flat topography and a sharp spectral peak at 60 Hz (or 50 Hz). If notch filtering was applied before ICA, this component may not be present.

How ICA Decisions Propagate

The ICA unmixing matrix and the component rejection decisions are stored in the stage output and applied to all downstream processing. The spectral analysis, connectivity analysis, microstate analysis, and source localization all operate on the ICA-cleaned signal.

This means that changing a component classification—keeping a component that was previously rejected, or vice versa—changes every subsequent analysis. The desktop application allows the clinician to modify component decisions in the data preparation step and recompute downstream stages. The pipeline is designed so that ICA decisions are always traceable: you can see exactly which components were kept, which were rejected, and what the clean signal looks like with those decisions applied.