tl;dr: Working with large medical or scientific images for machine learning? -> Use MLArray.
MLArray is a purpose-built file format for N-dimensional medical and scientific array data in machine learning workflows. It replaces the usual patchwork of source formats and late-stage conversions to NumPy/Zarr/Blosc2 by layering standardized metadata on top of a Blosc2-backed storage layout, so the same files work reliably across training, analysis, and visualization tools (including Napari and MITK).
You can install mlarray via pip:
pip install mlarrayTo enable the mlarray_convert CLI command, install MLArray with the necessary extra dependencies:
pip install mlarray[all]See the documentation for the API reference, the metadata schema, usage examples or CLI usage.
Below are common usage patterns for loading, saving, and working with metadata.
import numpy as np
from mlarray import MLArray
array = np.random.random((128, 256, 256))
image = MLArray(array) # Create MLArray image
image.save("sample.mla")
image = MLArray("sample.mla") # Loads imagefrom mlarray import MLArray
import numpy as np
# read-only, partial access (default)
image = MLArray.open("sample.mla", mmap_mode='r')
crop = image[10:20, 50:60] # Read crop
# read/write, partial access
image = MLArray.open("sample.mla", mmap_mode='r+')
image[10:20, 50:60] *= 5 # Modify crop in memory and disk
# read/write, partial access, create/overwrite
array = np.random.random((128, 256, 256))
image = MLArray.create("sample.mla", shape=array.shape, dtype=array.dtype, mmap_mode='w+')
image[...] = array # Modify image in memory and diskimport numpy as np
from mlarray import MLArray
array = np.random.random((64, 128, 128))
image = MLArray(
array,
spacing=(1.0, 1.0, 1.5),
origin=(10.0, 10.0, 30.0),
direction=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
meta={"patient_id": "123", "modality": "CT"}, # Any metadata from the original image source (for example raw DICOM metadata)
)
print(image.spacing) # [1.0, 1.0, 1.5]
print(image.origin) # [10.0, 10.0, 30.0]
print(image.meta.source) # {"patient_id": "123", "modality": "CT"}
image.spacing[1] = 5.3
image.meta.source["study_id"] = "study-001"
image.save("with-metadata.mla")
# Open memory-mapped
image = MLArray.open("with-metadata.mla", mmap_mode='r+')
image.meta.source["study_id"] = "new-study" # Modify metadata
image.close() # Close and save metadata, only necessary to save modified metadataimport numpy as np
from mlarray import MLArray
base = MLArray("sample.mla")
array = np.random.random(base.shape)
image = MLArray(
array,
spacing=(0.8, 0.8, 1.0),
copy=base, # Copies all non-explicitly set arguments from base
)
image.save("copied-metadata.mla")import numpy as np
from mlarray import MLArray, Meta
array = np.random.random((64, 128, 128))
image = MLArray(
array,
meta=Meta(source={"patient_id": "123", "modality": "CT"}, is_seg=True), # Add metadata in a pre-defined format
)
print(image.meta.source) # {"patient_id": "123", "modality": "CT"}
print(image.meta.is_seg) # True
image.meta.source["study_id"] = "study-001"
image.meta.is_seg = False
image.save("with-metadata.mla")Default patch size (192):
from mlarray import MLArray
image = MLArray("sample.mla") # Existing file
image.save("default-patch.mla") # Keeps existing layout metadata
loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size='default')
image.save("default-patch-relayout.mla") # Uses constructor patch_size='default' (192)Custom isotropic patch size (512):
from mlarray import MLArray
loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=512)
image.save("patch-512.mla")Custom non-isotropic patch size:
from mlarray import MLArray
loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=(128, 192, 256))
image.save("patch-non-iso.mla")Manual chunk/block size:
from mlarray import MLArray
loaded = MLArray("sample.mla")
image = MLArray(
loaded.to_numpy(),
patch_size=None,
chunk_size=(1, 128, 128),
block_size=(1, 32, 32),
)
image.save("manual-chunk-block.mla")Let Blosc2 itself configure chunk/block size:
from mlarray import MLArray
loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=None)
# If patch_size, chunk_size and block_size are all None, Blosc2 will auto-configure chunk and block size
image.save("blosc2-auto.mla")Print the metadata header from a .mla file.
mlarray_header sample.mlaConvert between MLArray and NIfTI/NRRD files.
When converting from NIfTI/NRRD to MLArray, source metadata is copied into
meta.source.
When converting from MLArray to NIfTI/NRRD, only meta.source is copied into
the output header. Spatial metadata (spacing, origin, direction) is set
explicitly from meta.spatial.
meta.spatial.coord_system is propagated conservatively. For NRRD input, the
CLI reads explicit NRRD space metadata when available and maps
right-anterior-superior to RAS and left-posterior-superior to LPS,
while preserving other explicit NRRD space strings verbatim. For NIfTI input,
the CLI sets coord_system to LPS based on the current
MedVol -> SimpleITK/ITK import path, where the imported geometry is
represented in the ITK physical-space convention. MedVol's reindexing to NumPy
layout does not change that world-space convention. On NRRD output, RAS and
LPS are written back as NRRD space metadata when possible; arbitrary custom
coord_system strings are not emitted as NRRD space declarations unless they
already match a supported explicit NRRD space string. NIfTI output preserves
geometry, but does not export coord_system as an explicit NIfTI metadata
field.
mlarray_convert sample.nii.gz output.mla
mlarray_convert sample.mla output.nii.gzContributions are welcome! Please open a pull request with clear changes and add tests when appropriate.
This repository is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging and the Division of Medical Image Computing at DKFZ.


