Skip to content

refactor: centralize zarr object-blob writes in _write_object_array#4527

Open
galenlynch wants to merge 1 commit intoSpikeInterface:mainfrom
galenlynch:zarr-v3-migration
Open

refactor: centralize zarr object-blob writes in _write_object_array#4527
galenlynch wants to merge 1 commit intoSpikeInterface:mainfrom
galenlynch:zarr-v3-migration

Conversation

@galenlynch
Copy link
Copy Markdown
Contributor

Groups the six call sites that write JSON/Pickle-coded object arrays (recording provenance, sorting provenance, and extension dict/object data) behind a single helper in zarrextractors.py. Behavior-preserving on zarr v2 — verified by analyzer round-trip diff against main (identical bytes except for the non-deterministic runtime_s field).

Why

The create_dataset(..., object_codec=numcodecs.JSON() | Pickle()) pattern is a zarr v2 API shape — object_codec= is gone in zarr-python 3, where object codecs live in filters= (wrapped via numcodecs.zarr3.*). Centralizing the six scattered call sites now means the zarr v3 cut-over only has to flip one function instead of chasing the pattern across two files.

This is prep work for the eventual zarr-python 3 migration (currently blocked on hdmf-zarr upstream, which pins zarr<3). The refactor stands on its own and is safe to merge independently of that migration's timeline.

Changes

  • Add _write_object_array(group, name, data, codec="json"|"pickle") to src/spikeinterface/core/zarrextractors.py.
  • Replace four provenance-write sites in SortingAnalyzer.create_zarr and two extension-write sites in the zarr extension path with calls to the helper.
  • Drop the now-unused import numcodecs from two functions in sortinganalyzer.py.

Test plan

  • pytest src/spikeinterface/core/tests/test_zarrextractors.py
  • pytest src/spikeinterface/core/tests/test_sortinganalyzer.py
  • pytest src/spikeinterface/core/tests/test_baserecording.py src/spikeinterface/core/tests/test_basesorting.py src/spikeinterface/core/tests/test_analyzer_extension_core.py
  • Byte-level analyzer round-trip diff vs. main — identical modulo runtime_s.

Introduces a single helper in zarrextractors for writing JSON/Pickle-coded
object-dtype arrays, and routes the six existing call sites
(recording/sorting provenance + extension dict/object data) through it.
Byte-identical to the previous inline create_dataset(..., object_codec=...)
pattern under zarr v2, verified by analyzer round-trip diff.

This prepares the call sites for the upcoming zarr-python 3 migration,
where object_codec= is gone and object codecs live in filters= instead.
@alejoe91 alejoe91 added the refactor Refactor of code, with no change to functionality label Apr 17, 2026
Copy link
Copy Markdown
Member

@alejoe91 alejoe91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galenlynch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Refactor of code, with no change to functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants