Fix BNCI2020_002 reshape, MartinezCagigal2023 stim_trial, and MAMEM Figshare rate-limiting by bruAristimunha · Pull Request #1068 · NeuroTechX/moabb

bruAristimunha · 2026-05-06T19:25:43Z

Summary

Fixes three independent loader bugs surfaced by neuralfetch's MOABB wrappers, where the datasets were either silently corrupted or unusable from datacenter IPs:

BNCI2020_002 reshape produced wrong EEG layout (moabb/datasets/bnci/bnci_2020.py). bciexp.data is (n_channels, n_samples, n_trials) and F-contiguous out of scipy.io.loadmat; the default C-order reshape(n_channels, -1) produced a trial-fastest interleaved layout (flat[c, k] = data[c, k // n_trials, k % n_trials]) instead of the trial-major layout that the per-trial stim markers and downstream epoching both assumed. Every per-stimulus epoch was therefore sampled from the wrong trial and the canonical N2pc analysis sat at chance. Fix: transpose(0, 2, 1).reshape(...) so flat[c, t * n_samples + s] == data[c, s, t].
MartinezCagigal2023Checker / MartinezCagigal2023Pary stim_trial carried the wrong values (moabb/datasets/martinezcagigal2023_checker_cvep.py, ..._pary_cvep.py). The channel was passed trial_labels = unique_trials.astype(int) (per-recording trial index 0..n_trials-1) instead of the attended command id. The same marker value therefore pointed at different attended symbols across recordings and broke multiclass classification. The other c-VEP loaders (Castillos*, Thielen*) already encode the command id. Fix: resolve command_ids from cvep_data["command_idx"] (train) or from the commands_info label map (test), pass them to add_stim_channel_trial, and add command_id to the _trial_meta annotation extras so the attended symbol survives BIDS export. The epoch-level codebook keeps using per-recording trial id since codes is built per recording.
MAMEM SSVEP loaders 403'd on the Figshare API (moabb/datasets/ssvep_mamem.py, moabb/datasets/download.py). fs_get_file_list was called once per subject inside both data_path and _get_single_subject_data, which triggered the public endpoint's rate limit from datacenter IPs whenever a user iterated subjects. Two layers of caching now apply: (1) a process-level @functools.lru_cache(maxsize=None) on fs_get_file_list keyed by (article_id, version), and (2) an on-disk JSON cache (figshare_filelist_<id>.json) inside the dataset directory, so once a MAMEM dataset has been downloaded the loader never contacts Figshare again. force_update=True bypasses both layers.

Adds moabb/tests/test_dataset_fixes.py with three offline regression tests (mocking loadmat, _fs_paginated_file_list, and fs_get_file_list) so the suite stays network-free.

Test plan

python -m ruff check and python -m ruff format --check clean on all touched files.
pytest moabb/tests/test_dataset_fixes.py (3 passed) — covers the trial-major layout assertion, the lru_cache short-circuit, and the on-disk filelist cache + force_update bypass.
pytest moabb/tests/test_martinezcagigal_verbosity.py still passes.
CI offline test suite passes.
Spot-check on real data (BNCI2020_002 N2pc decoding, MartinezCagigal multi-recording multiclass).

Notes

The BNCI2020_002 reshape fix is independently verified by neuralfetch's test_reichert2020_impact_eeg_layout (which previously had to override _load_raw to work around the bug). With this PR, the override can be removed downstream.

- BNCI2020_002 (`_convert_attention_shift`): the F-contiguous `bciexp.data` array was being flattened with the default C-order reshape, which produced a trial-fastest interleaved layout (`flat[c, k] = data[c, k // n_trials, k % n_trials]`) instead of the trial-major layout that every per-trial stim marker assumes. Every per-stimulus epoch was therefore sampled from the wrong trial and the canonical N2pc analysis sat at chance. Reshape via `transpose(0, 2, 1).reshape(...)` so `flat[c, t * n_samples + s]` equals `data[c, s, t]`. - MartinezCagigal2023Checker / MartinezCagigal2023Pary: `stim_trial` was carrying the per-recording trial index `0..n_trials-1` instead of the attended command id, breaking multi-recording multiclass classification (the same marker value pointed at different commands across recordings). Resolve `command_ids` from `cvep_data["command_idx"]` for train mode and from the `commands_info` label map for test mode, pass them to `add_stim_channel_trial`, and add `command_id` to the `_trial_meta` annotation extras so the attended symbol survives BIDS export. Codebook indexing (`add_stim_channel_epoch`) keeps using the per-recording trial id since `codes` is built per recording. - MAMEM SSVEP loaders: `fs_get_file_list` was being hammered once per subject inside both `data_path` and `_get_single_subject_data`, which triggered Figshare's 403 rate limit from datacenter IPs. Add a process-level `lru_cache` keyed by `(article_id, version)` on `fs_get_file_list`, and persist the listing as JSON next to the data in `BaseMAMEM` so subsequent runs of an already-downloaded dataset skip the API entirely. `force_update=True` bypasses both layers. Add `moabb/tests/test_dataset_fixes.py` with offline regression tests for all three fixes (mocking `loadmat`, `_fs_paginated_file_list`, and `fs_get_file_list` so the suite stays network-free).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 118ff462df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Tighten the on-disk filelist cache so an already-downloaded MAMEM dataset never contacts Figshare: - Centralise local-path resolution in `BaseMAMEM._dataset_root`, so `data_path` and `_get_single_subject_data` never disagree on where the filelist cache should live. - Document `_load_or_fetch_filelist`'s resolution order (disk cache first, API only as a fallback) and add INFO/DEBUG logs so users can see "loaded from cache" vs "fetching from Figshare" without enabling network tracing. - Surface the cache failure path: if the JSON cache is unreadable (corrupt / wrong version) we now log the underlying error before falling back to the API. Add `test_mamem_already_downloaded_does_not_ping_figshare` -- an end-to-end test that pre-creates local files and the cache JSON, then iterates `data_path` over three subjects with `fs_get_file_list` patched to raise on call. Confirms zero Figshare API hits in the "already downloaded" case (the user's primary concern).

- Hoist the c-VEP train/test command-id resolution into a shared ``resolve_cvep_command_ids`` helper in ``moabb/datasets/utils.py``; both MartinezCagigal files now call it instead of carrying their own copy of the train/test branch. - Replace the per-trial ``np.where(trial_idx == t)[0][0]`` scans (one for command_ids, one for first_trial_onsets — both O(n_trials × n)) with a single ``np.unique(trial_idx, return_index=True)`` pass that feeds both arrays. - Drop the redundant ``osp.exists`` TOCTOU check in ``BaseMAMEM._load_or_fetch_filelist``; let the open() attempt and ``FileNotFoundError`` decide. - Trim verbose docstrings, comments, and log messages across the changed files (BNCI2020_002 reshape comment, ``_load_or_fetch_filelist`` / ``_dataset_root`` / ``_filelist_cache_path``, ``fs_get_file_list``, whats_new entries, and the test docstrings).

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Comment thread moabb/datasets/ssvep_mamem.py

bruAristimunha added 2 commits May 6, 2026 21:33

bruAristimunha enabled auto-merge (squash) May 7, 2026 08:54

bruAristimunha disabled auto-merge May 7, 2026 08:59

bruAristimunha merged commit 4f44a90 into develop May 7, 2026
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BNCI2020_002 reshape, MartinezCagigal2023 stim_trial, and MAMEM Figshare rate-limiting#1068

Fix BNCI2020_002 reshape, MartinezCagigal2023 stim_trial, and MAMEM Figshare rate-limiting#1068
bruAristimunha merged 3 commits into
developfrom
fix/bnci2020-mc2023-mamem-loaders

bruAristimunha commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bruAristimunha commented May 6, 2026

Summary

Test plan

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant