Parameter Logging & Reproducibility

Reproducibility in clinical EEG means that if you process the same recording twice with the same configuration, you get the same results. Not “similar” results, not “results within tolerance”—identical results, bit for bit. This is a stronger guarantee than most scientific software provides, and it requires deliberate engineering at every stage of the pipeline.

The Coherence Workstation achieves reproducibility through three mechanisms: fixed random seeds for all stochastic operations, deterministic algorithm selection, and comprehensive parameter logging that records exactly what was done and how.

The Stage Output Format

Every pipeline stage writes its results as a standardized JSON file through a common serialization layer (stage_output.py). The structure is consistent across all stages:

{
  "stage": "spectral_resting_eo",
  "generated": "2026-03-05T21:52:43Z",
  "data": {
    "band_power": { "Delta": [...], "Theta": [...] },
    "aperiodic": { "slope": 1.23, "offset": 2.1 },
    "asymmetry": { "F3/F4": { "Alpha": -0.05 } }
  },
  "plots": {
    "topo_delta": "topo_delta_eo.png",
    "stacked_spectra": "stacked_spectra_eo.png"
  }
}

stage identifies which pipeline stage produced this output. The stage name is a fixed string that the desktop application uses to locate and parse the data.

generated is the ISO 8601 timestamp of when the stage was processed. This allows tracking of reprocessing events—if a session shows multiple generated timestamps for the same stage, the data was reprocessed.

data contains the structured numerical results of the analysis—band power values, aperiodic parameters, connectivity matrices, ERP amplitudes. This is what the interactive React components in the desktop application read to render visualizations. All values are JSON-native types (numbers, strings, arrays, objects); NumPy arrays are converted to nested Python lists via .tolist() before serialization.

plots maps visualization names to PNG file paths for backward compatibility. Older sessions that predate the interactive React visualizations can still display these static images. New sessions produce both structured data and PNG plots.

NumPy Safety

EEG signal processing produces NumPy arrays, and NumPy arrays don’t serialize to JSON natively. The pipeline enforces serialization safety through a custom NumpySafeEncoder that:

Converts numpy.ndarray to nested Python lists via .tolist()
Converts numpy.float64, numpy.int64, and other NumPy scalar types to native Python float and int
Replaces NaN and Inf values with null (JSON doesn’t have a NaN representation)
Raises an error if an unrecognized type is encountered, preventing silent data corruption

This encoder is used by every stage output write. If a developer adds new data to a stage output without converting NumPy types, the encoder catches it at serialization time rather than producing invalid JSON that fails silently downstream.

What Gets Logged

The parameter sidecar records the processing configuration used for each session. This includes:

Pipeline version: The exact version of the Coherence Workstation code that produced the output. This allows identifying which code path was used, even after software updates.

Configuration snapshot: The full contents of configs/default.yaml (or the custom configuration file) as it existed at processing time. This is the authoritative record of every parameter—filter cutoffs, band boundaries, epoch lengths, thresholds, algorithm selections.

Channel information: The channel names present in the recording, any channel renaming that was applied, bad channels detected and interpolated, and the final channel set used for analysis.

ICA decisions: Which components were identified, their ICLabel classifications, and the keep/reject decisions (auto and manual). This is critical for reproducibility—the same ICA decomposition with different component decisions produces different downstream results.

Artifact rejection: Which time segments were flagged by each filter (voltage, slow-wave, fast-wave), the total good-data duration, and the percentage of data retained.

Processing timestamps: When each stage started and completed, allowing reconstruction of the processing timeline.

Reproducibility Guarantees

The pipeline provides deterministic output for deterministic input. The mechanisms:

Fixed random seed (random_state: 42): ICA decomposition involves random initialization. The fixed seed ensures that the same data always produces the same decomposition. If the seed were different, the components would be equivalent but appear in a different order—deterministic seeds eliminate even this cosmetic variation.

Deterministic algorithm selection: Picard with L-BFGS optimization is deterministic given the same input, parameters, and random state. No algorithm in the pipeline involves non-deterministic operations (no GPU-dependent floating-point ordering, no parallel-dependent reduction ordering).

Version-locked dependencies: The pipeline’s dependencies (MNE-Python, specparam, mne-icalabel, scipy) are pinned to specific versions. Different versions of these libraries may produce slightly different numerical results due to bug fixes, algorithm improvements, or floating-point implementation changes. Version pinning ensures that the same software produces the same results.

Verifying Processing Settings

To verify what settings were used for a particular session, examine the stage JSON files in the session’s processed/stages/ directory. Each file contains the generated timestamp and the structured data output. The configuration snapshot in the parameter sidecar records the exact settings.

For clinical audit purposes, the combination of the parameter sidecar and the stage JSON outputs provides a complete chain of evidence: what data went in, what configuration was applied, what decisions were made (channel exclusions, ICA rejections, artifact flagging), and what results came out. This chain is maintained automatically by the pipeline—no manual documentation is required.

When to Reprocess

Reprocessing a session—running the pipeline again on the same data—is necessary when:

The pipeline configuration has changed (different filter cutoffs, band boundaries, or thresholds)
ICA component decisions have been modified in the interactive review step
A software update has changed the processing algorithms
An error in the original processing was identified and corrected

The pipeline supports selective reprocessing: if only ICA decisions changed, only the post-ICA stages need to be recomputed. The desktop application handles this automatically when the clinician modifies component decisions in the ICA Review step.

Each reprocessing event produces new stage JSON files with updated generated timestamps. The previous outputs are not automatically deleted—the session maintains a history of processing runs. The most recent output is used for display; previous outputs remain available for comparison.