GitHub - MIC-DKFZ/mlarray: Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.

tl;dr: Working with large medical or scientific images for machine learning? -> Use MLArray.

MLArray is a purpose-built file format for N-dimensional medical and scientific array data in machine learning workflows. It replaces the usual patchwork of source formats and late-stage conversions to NumPy/Zarr/Blosc2 by layering standardized metadata on top of a Blosc2-backed storage layout, so the same files work reliably across training, analysis, and visualization tools (including Napari and MITK).

Installation

You can install mlarray via pip:

pip install mlarray

To enable the mlarray_convert CLI command, install MLArray with the necessary extra dependencies:

pip install mlarray[all]

Documentaion

See the documentation for the API reference, the metadata schema, usage examples or CLI usage.

Usage

Below are common usage patterns for loading, saving, and working with metadata.

Default usage

import numpy as np
from mlarray import MLArray

array = np.random.random((128, 256, 256))
image = MLArray(array)  # Create MLArray image
image.save("sample.mla")

image = MLArray("sample.mla")  # Loads image

Memory-mapped usage

from mlarray import MLArray
import numpy as np

# read-only, partial access (default)
image = MLArray.open("sample.mla", mmap_mode='r')  
crop = image[10:20, 50:60]  # Read crop

# read/write, partial access
image = MLArray.open("sample.mla", mmap_mode='r+')  
image[10:20, 50:60] *= 5  # Modify crop in memory and disk

# read/write, partial access, create/overwrite
array = np.random.random((128, 256, 256))
image = MLArray.create("sample.mla", shape=array.shape, dtype=array.dtype, mmap_mode='w+')
image[...] = array  # Modify image in memory and disk

Metadata inspection and manipulation

import numpy as np
from mlarray import MLArray

array = np.random.random((64, 128, 128))
image = MLArray(
    array,
    spacing=(1.0, 1.0, 1.5),
    origin=(10.0, 10.0, 30.0),
    direction=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    meta={"patient_id": "123", "modality": "CT"},  # Any metadata from the original image source (for example raw DICOM metadata)
)

print(image.spacing)  # [1.0, 1.0, 1.5]
print(image.origin)  # [10.0, 10.0, 30.0]
print(image.meta.source)  # {"patient_id": "123", "modality": "CT"}

image.spacing[1] = 5.3
image.meta.source["study_id"] = "study-001"
image.save("with-metadata.mla")

# Open memory-mapped
image = MLArray.open("with-metadata.mla", mmap_mode='r+')  
image.meta.source["study_id"] = "new-study"  # Modify metadata
image.close()  # Close and save metadata, only necessary to save modified metadata

Copy metadata with overrides

import numpy as np
from mlarray import MLArray

base = MLArray("sample.mla")
array = np.random.random(base.shape)

image = MLArray(
    array,
    spacing=(0.8, 0.8, 1.0),
    copy=base,  # Copies all non-explicitly set arguments from base
)

image.save("copied-metadata.mla")

Standardized metadata usage

import numpy as np
from mlarray import MLArray, Meta

array = np.random.random((64, 128, 128))
image = MLArray(
    array,
    meta=Meta(source={"patient_id": "123", "modality": "CT"}, is_seg=True),  # Add metadata in a pre-defined format
)

print(image.meta.source)  # {"patient_id": "123", "modality": "CT"}
print(image.meta.is_seg)  # True

image.meta.source["study_id"] = "study-001"
image.meta.is_seg = False
image.save("with-metadata.mla")

Patch size variants

Default patch size (192):

from mlarray import MLArray

image = MLArray("sample.mla")  # Existing file
image.save("default-patch.mla")  # Keeps existing layout metadata

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size='default')
image.save("default-patch-relayout.mla")  # Uses constructor patch_size='default' (192)

Custom isotropic patch size (512):

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=512)
image.save("patch-512.mla")

Custom non-isotropic patch size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=(128, 192, 256))
image.save("patch-non-iso.mla")

Manual chunk/block size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(
    loaded.to_numpy(),
    patch_size=None,
    chunk_size=(1, 128, 128),
    block_size=(1, 32, 32),
)
image.save("manual-chunk-block.mla")

Let Blosc2 itself configure chunk/block size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=None)
# If patch_size, chunk_size and block_size are all None, Blosc2 will auto-configure chunk and block size
image.save("blosc2-auto.mla")

CLI

mlarray_header

Print the metadata header from a .mla file.

mlarray_header sample.mla

mlarray_convert

Convert between MLArray and NIfTI/NRRD files.

When converting from NIfTI/NRRD to MLArray, source metadata is copied into meta.source.

When converting from MLArray to NIfTI/NRRD, only meta.source is copied into the output header. Spatial metadata (spacing, origin, direction) is set explicitly from meta.spatial.

meta.spatial.coord_system is propagated conservatively. For NRRD input, the CLI reads explicit NRRD space metadata when available and maps right-anterior-superior to RAS and left-posterior-superior to LPS, while preserving other explicit NRRD space strings verbatim. For NIfTI input, the CLI sets coord_system to LPS based on the current MedVol -> SimpleITK/ITK import path, where the imported geometry is represented in the ITK physical-space convention. MedVol's reindexing to NumPy layout does not change that world-space convention. On NRRD output, RAS and LPS are written back as NRRD space metadata when possible; arbitrary custom coord_system strings are not emitted as NRRD space declarations unless they already match a supported explicit NRRD space string. NIfTI output preserves geometry, but does not export coord_system as an explicit NIfTI metadata field.

mlarray_convert sample.nii.gz output.mla
mlarray_convert sample.mla output.nii.gz

Contributing

Contributions are welcome! Please open a pull request with clear changes and add tests when appropriate.

Acknowledgments

This repository is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging and the Division of Medical Image Computing at DKFZ.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
mlarray		mlarray
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Documentaion

Usage

Default usage

Memory-mapped usage

Metadata inspection and manipulation

Copy metadata with overrides

Standardized metadata usage

Patch size variants

CLI

mlarray_header

mlarray_convert

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Documentaion

Usage

Default usage

Memory-mapped usage

Metadata inspection and manipulation

Copy metadata with overrides

Standardized metadata usage

Patch size variants

CLI

mlarray_header

mlarray_convert

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages