Skip to content

Fix BNCI2020_002 reshape, MartinezCagigal2023 stim_trial, and MAMEM Figshare rate-limiting#1068

Merged
bruAristimunha merged 3 commits into
developfrom
fix/bnci2020-mc2023-mamem-loaders
May 7, 2026
Merged

Fix BNCI2020_002 reshape, MartinezCagigal2023 stim_trial, and MAMEM Figshare rate-limiting#1068
bruAristimunha merged 3 commits into
developfrom
fix/bnci2020-mc2023-mamem-loaders

Conversation

@bruAristimunha
Copy link
Copy Markdown
Collaborator

Summary

Fixes three independent loader bugs surfaced by neuralfetch's MOABB wrappers, where the datasets were either silently corrupted or unusable from datacenter IPs:

  • BNCI2020_002 reshape produced wrong EEG layout (moabb/datasets/bnci/bnci_2020.py). bciexp.data is (n_channels, n_samples, n_trials) and F-contiguous out of scipy.io.loadmat; the default C-order reshape(n_channels, -1) produced a trial-fastest interleaved layout (flat[c, k] = data[c, k // n_trials, k % n_trials]) instead of the trial-major layout that the per-trial stim markers and downstream epoching both assumed. Every per-stimulus epoch was therefore sampled from the wrong trial and the canonical N2pc analysis sat at chance. Fix: transpose(0, 2, 1).reshape(...) so flat[c, t * n_samples + s] == data[c, s, t].

  • MartinezCagigal2023Checker / MartinezCagigal2023Pary stim_trial carried the wrong values (moabb/datasets/martinezcagigal2023_checker_cvep.py, ..._pary_cvep.py). The channel was passed trial_labels = unique_trials.astype(int) (per-recording trial index 0..n_trials-1) instead of the attended command id. The same marker value therefore pointed at different attended symbols across recordings and broke multiclass classification. The other c-VEP loaders (Castillos*, Thielen*) already encode the command id. Fix: resolve command_ids from cvep_data["command_idx"] (train) or from the commands_info label map (test), pass them to add_stim_channel_trial, and add command_id to the _trial_meta annotation extras so the attended symbol survives BIDS export. The epoch-level codebook keeps using per-recording trial id since codes is built per recording.

  • MAMEM SSVEP loaders 403'd on the Figshare API (moabb/datasets/ssvep_mamem.py, moabb/datasets/download.py). fs_get_file_list was called once per subject inside both data_path and _get_single_subject_data, which triggered the public endpoint's rate limit from datacenter IPs whenever a user iterated subjects. Two layers of caching now apply: (1) a process-level @functools.lru_cache(maxsize=None) on fs_get_file_list keyed by (article_id, version), and (2) an on-disk JSON cache (figshare_filelist_<id>.json) inside the dataset directory, so once a MAMEM dataset has been downloaded the loader never contacts Figshare again. force_update=True bypasses both layers.

Adds moabb/tests/test_dataset_fixes.py with three offline regression tests (mocking loadmat, _fs_paginated_file_list, and fs_get_file_list) so the suite stays network-free.

Test plan

  • python -m ruff check and python -m ruff format --check clean on all touched files.
  • pytest moabb/tests/test_dataset_fixes.py (3 passed) — covers the trial-major layout assertion, the lru_cache short-circuit, and the on-disk filelist cache + force_update bypass.
  • pytest moabb/tests/test_martinezcagigal_verbosity.py still passes.
  • CI offline test suite passes.
  • Spot-check on real data (BNCI2020_002 N2pc decoding, MartinezCagigal multi-recording multiclass).

Notes

The BNCI2020_002 reshape fix is independently verified by neuralfetch's test_reichert2020_impact_eeg_layout (which previously had to override _load_raw to work around the bug). With this PR, the override can be removed downstream.

- BNCI2020_002 (`_convert_attention_shift`): the F-contiguous
  `bciexp.data` array was being flattened with the default C-order
  reshape, which produced a trial-fastest interleaved layout
  (`flat[c, k] = data[c, k // n_trials, k % n_trials]`) instead of the
  trial-major layout that every per-trial stim marker assumes. Every
  per-stimulus epoch was therefore sampled from the wrong trial and the
  canonical N2pc analysis sat at chance. Reshape via
  `transpose(0, 2, 1).reshape(...)` so `flat[c, t * n_samples + s]`
  equals `data[c, s, t]`.
- MartinezCagigal2023Checker / MartinezCagigal2023Pary: `stim_trial`
  was carrying the per-recording trial index `0..n_trials-1` instead of
  the attended command id, breaking multi-recording multiclass
  classification (the same marker value pointed at different commands
  across recordings). Resolve `command_ids` from `cvep_data["command_idx"]`
  for train mode and from the `commands_info` label map for test mode,
  pass them to `add_stim_channel_trial`, and add `command_id` to the
  `_trial_meta` annotation extras so the attended symbol survives BIDS
  export. Codebook indexing (`add_stim_channel_epoch`) keeps using the
  per-recording trial id since `codes` is built per recording.
- MAMEM SSVEP loaders: `fs_get_file_list` was being hammered once per
  subject inside both `data_path` and `_get_single_subject_data`, which
  triggered Figshare's 403 rate limit from datacenter IPs. Add a
  process-level `lru_cache` keyed by `(article_id, version)` on
  `fs_get_file_list`, and persist the listing as JSON next to the data
  in `BaseMAMEM` so subsequent runs of an already-downloaded dataset
  skip the API entirely. `force_update=True` bypasses both layers.

Add `moabb/tests/test_dataset_fixes.py` with offline regression tests
for all three fixes (mocking `loadmat`, `_fs_paginated_file_list`, and
`fs_get_file_list` so the suite stays network-free).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 118ff462df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread moabb/datasets/ssvep_mamem.py
Tighten the on-disk filelist cache so an already-downloaded MAMEM
dataset never contacts Figshare:

- Centralise local-path resolution in `BaseMAMEM._dataset_root`, so
  `data_path` and `_get_single_subject_data` never disagree on where
  the filelist cache should live.
- Document `_load_or_fetch_filelist`'s resolution order (disk cache
  first, API only as a fallback) and add INFO/DEBUG logs so users can
  see "loaded from cache" vs "fetching from Figshare" without enabling
  network tracing.
- Surface the cache failure path: if the JSON cache is unreadable
  (corrupt / wrong version) we now log the underlying error before
  falling back to the API.

Add `test_mamem_already_downloaded_does_not_ping_figshare` -- an
end-to-end test that pre-creates local files and the cache JSON, then
iterates `data_path` over three subjects with `fs_get_file_list`
patched to raise on call.  Confirms zero Figshare API hits in the
"already downloaded" case (the user's primary concern).
- Hoist the c-VEP train/test command-id resolution into a shared
  ``resolve_cvep_command_ids`` helper in ``moabb/datasets/utils.py``;
  both MartinezCagigal files now call it instead of carrying their own
  copy of the train/test branch.
- Replace the per-trial ``np.where(trial_idx == t)[0][0]`` scans (one
  for command_ids, one for first_trial_onsets — both O(n_trials × n))
  with a single ``np.unique(trial_idx, return_index=True)`` pass that
  feeds both arrays.
- Drop the redundant ``osp.exists`` TOCTOU check in
  ``BaseMAMEM._load_or_fetch_filelist``; let the open() attempt and
  ``FileNotFoundError`` decide.
- Trim verbose docstrings, comments, and log messages across the
  changed files (BNCI2020_002 reshape comment, ``_load_or_fetch_filelist``
  / ``_dataset_root`` / ``_filelist_cache_path``, ``fs_get_file_list``,
  whats_new entries, and the test docstrings).
@bruAristimunha bruAristimunha enabled auto-merge (squash) May 7, 2026 08:54
@bruAristimunha bruAristimunha disabled auto-merge May 7, 2026 08:59
@bruAristimunha bruAristimunha merged commit 4f44a90 into develop May 7, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant