Skip to content

Feature/binary data operation#89

Open
jwasikpsnc wants to merge 15 commits into
iterorganization:developfrom
jwasikpsnc:feature/binary_data_operation
Open

Feature/binary data operation#89
jwasikpsnc wants to merge 15 commits into
iterorganization:developfrom
jwasikpsnc:feature/binary_data_operation

Conversation

@jwasikpsnc

Copy link
Copy Markdown
Collaborator
  • adds /data/plot_data/ operations parameter
  • adds operations description in /info/data_manipulation_methods endpoint response
  • accepts operations as list ["operation:value", "..."] e.g. ["add:1","mul:2","div:3"]`
  • operations are executed in order determined by position in the list
  • allowed operations:
    • add
    • sub
    • mul
    • div
    • pow
    • root
  • updated docs

Waiting for #72 to be merged first

@read-the-docs-community

read-the-docs-community Bot commented Jun 18, 2026

Copy link
Copy Markdown

Documentation build overview

📚 imas-ibex | 🛠️ Build #33328168 | 📁 Comparing 24610a6 against latest (024d53b)

  🔍 Preview build  

133 files changed · + 83 added · ± 50 modified

+ Added

± Modified

…re/binary_data_operation

# Conflicts:
#	backend/ibex/data_source/imas_python_source.py
@jwasikpsnc jwasikpsnc assigned jwasikpsnc and unassigned jwasikpsnc Jun 24, 2026
@jwasikpsnc jwasikpsnc marked this pull request as ready for review June 24, 2026 09:51
@olivhoenen

olivhoenen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Code Review — Backend Issues

Reviewing the backend changes introduced by this PR. No style comments — only genuine bugs and a performance concern (copilot: sonnet 4.6).


🔴 Bug 1: root:0 causes unhandled ZeroDivisionError → HTTP 500 instead of 466

File: backend/ibex/data_source/imas_python_source_utils.py

The root operation computes 1 / v in plain Python before calling np.power. When a user sends root:0, Python raises ZeroDivisionError before np.power is ever called. This exception is not a subclass of IbexException, so it becomes an unhandled 500 instead of 466.

"root": lambda r, v: np.power(r, 1 / v),  # ZeroDivisionError if v=0

This is inconsistent with div, which correctly raises InvalidParametersException via _safe_division. There is also no test for root:0 in test_data_manipulation.py.

Fix: Add a guard mirroring _safe_division:

def _safe_root(data, v):
    if v == 0:
        raise InvalidParametersException("Root with degree 0 is not allowed")
    return np.power(data, 1.0 / v)

"root": _safe_root,

🔴 Bug 2: Malformed operation strings cause unhandled ValueError → HTTP 500 instead of 466

File: backend/ibex/data_source/imas_python_source_utils.py

The operations field in PlotDataBasicParameters is Optional[List[str]] with no format validation at the schema level. Two unguarded parsing steps can raise bare ValueError:

op_type, value_str = op_str.split(":", 1)  # ValueError if no ':' present
value = float(value_str)                    # ValueError if value is non-numeric

Both become 500 responses instead of 466.

Fix:

parts = op_str.split(":", 1)
if len(parts) != 2:
    raise InvalidParametersException(f"Invalid operation format '{op_str}'. Expected 'type:value'")
op_type, value_str = parts
try:
    value = float(value_str)
except ValueError:
    raise InvalidParametersException(f"Operation value '{value_str}' is not a valid number")

🔴 Bug 3: IMAS_URI.occurrence is a str when present, int when absent

File: backend/ibex/core/utils.py

occurrence: int = 0  # declared as int
...
self.occurrence = match.group("occurrence") if match.group("occurrence") else 0
#                 ^^^ str, e.g. "1"                                          ^^^ int

When a URI contains an explicit occurrence (e.g. imas:...#core_profiles:1/...), occurrence is stored as the string "1". When absent, it is 0 (int). In get_plot_data, occurrence is then passed directly to entry.get(ids, occurrence=occurrence, ...) without casting. Other parts of the PR correctly use int(occurrence) explicitly — this path does not.

Fix:

self.occurrence = int(match.group("occurrence")) if match.group("occurrence") else 0

🟠 Bug 4: calculate_coordinate_shapesn_coords is declared but never used

File: backend/ibex/data_source/imas_python_source_utils.py

The docstring promises:

raises ValueError: If len(shape) != dims + n_coords

But this check is never implemented — n_coords is accepted and immediately ignored. If the caller passes inconsistent values, the wrong number of shapes is silently produced, and the downstream enumerate(coordinates_to_be_returned) loop will either update fewer coordinates than expected or raise an IndexError.

Fix:

if len(shape) != dims + n_coords:
    raise ValueError(
        f"len(shape)={len(shape)} must equal dims + n_coords = {dims} + {n_coords}"
    )

🟠 Bug 5: _get_descendant_node_names silently drops struct_array children

File: backend/ibex/data_source/imas_python_source.py (inner function in get_geometry_overlay_nodes)

def _get_descendant_node_names(metadata: IDSMetadata):
    res = []
    if metadata.data_type == IDSDataType.STRUCTURE:
        for child in metadata:
            res.extend([f"{metadata.name}/{x}" for x in _get_descendant_node_names(child)])
    else:
        res.append(f"{metadata.name}")  # struct_array hits this branch — only its name returned
    return res

If a geometry node has a struct_array child (e.g. element[:] inside a vessel structure), the function returns only the struct_array's name as though it were a leaf, rather than recursing into its fields (r, z, etc.). Those parameters are then missing from parameters_entry["parameters"], so the geometry node is either excluded or returned with an incomplete parameter list.

Fix:

if metadata.data_type in (IDSDataType.STRUCTURE, IDSDataType.STRUCT_ARRAY):
    for child in metadata:
        res.extend([f"{metadata.name}/{x}" for x in _get_descendant_node_names(child)])
else:
    res.append(f"{metadata.name}")

🟡 Bug 6: _walk_outline_nodes uses plain in instead of path_in_filled_paths

File: backend/ibex/data_source/imas_python_source.py

node_filled = any(
    f"{metadata.path_string}/{parameter}" in filled_paths
    for parameter in parameters_entry["parameters"]
)

filled_paths from IMAS-Python contains paths with array indices (e.g. wall/description_2d[0]/vessel/unit[1]/element[1]/outline/r). But metadata.path_string is an abstract path without those indices. The concatenation will not match, so filled geometry nodes can be incorrectly treated as empty and excluded.

The module already provides path_in_filled_paths() which handles prefix and index-agnostic matching — it just isn't used here.

Fix:

node_filled = any(
    path_in_filled_paths(f"{metadata.path_string}/{parameter}", filled_paths)
    for parameter in parameters_entry["parameters"]
)

⚡ Performance: _OP_FUNCS dict rebuilt on every recursive call

File: backend/ibex/data_source/imas_python_source_utils.py

_OP_FUNCS is defined inside apply_simple_operations. Because the function recurses over lists ([apply_simple_operations(x, ops) for x in data]), the dict and all 6 lambda objects are recreated for every element. For typical AoS IMAS data (a list of 100 arrays), that is 100 unnecessary dict + lambda constructions per call.

Benchmarking a list of 100 arrays with 2 operations shows ~9% speedup by moving the dict to module level. Replacing the lambdas with operator module functions (pure C) makes no meaningful additional difference for NumPy arrays since the array computation itself dominates.

Suggested refactor (also fixes Bugs 1 and 2):

import operator as _op

def _safe_division(data, divisor):
    if divisor == 0:
        raise InvalidParametersException("Division by zero is not allowed")
    return data / divisor

def _safe_root(data, v):
    if v == 0:
        raise InvalidParametersException("Root with degree 0 is not allowed")
    return np.power(data, 1.0 / v)

# Defined once at module level — not rebuilt on every recursive call
_SIMPLE_OP_FUNCS = {
    "add": _op.add,
    "sub": _op.sub,
    "mul": _op.mul,
    "div": _safe_division,
    "pow": np.power,
    "root": _safe_root,
}

def apply_simple_operations(data: list | np.ndarray, operations: list[str]):
    if isinstance(data, list):
        return [apply_simple_operations(x, operations) for x in data]
    elif isinstance(data, (np.ndarray, IDSNumericArray)):
        result = data
        for op_str in operations:
            parts = op_str.split(":", 1)
            if len(parts) != 2:
                raise InvalidParametersException(
                    f"Invalid operation format '{op_str}'. Expected 'type:value'"
                )
            op_type, value_str = parts
            try:
                value = float(value_str)
            except ValueError:
                raise InvalidParametersException(
                    f"Operation value '{value_str}' is not a valid number"
                )
            func = _SIMPLE_OP_FUNCS.get(op_type)
            if func is None:
                raise InvalidParametersException(f"Unknown operation type: {op_type}")
            result = func(result, value)
        return result
    else:
        raise InvalidParametersException(
            "Simple operations can be executed only on numeric arrays, not single values or strings."
        )

@jwasikpsnc

Copy link
Copy Markdown
Collaborator Author

🔴 Bug 1: root:0 causes unhandled ZeroDivisionError → HTTP 500 instead of 466

  • added safe root function

🔴 Bug 2: Malformed operation strings cause unhandled ValueError → HTTP 500 instead of 466

  • added format validation at pydantic model level

🔴 Bug 3: IMAS_URI.occurrence is a str when present, int when absent

  • cast to int in IMAS_URI class
  • regex accepts only numbers now
  • now can throw 466 (InvalidParameter) if uri doesn't match regex
  • added tests for IMAS_URI

🟠 Bug 4: calculate_coordinate_shapes — n_coords is declared but never used

  • deleted unused parameter

🟠 Bug 5: _get_descendant_node_names silently drops struct_array children

  • IMO irrelevant because there are no AoS in geometry structures, but added STRUCT_ARRAY check anyway

🟡 Bug 6: _walk_outline_nodes uses plain in instead of path_in_filled_paths

  • fixed

⚡ Performance: _OP_FUNCS dict rebuilt on every recursive call

  • moved _OP_FUNCS to module level and renamed it _SIMPLE_OPERATIONS_FUNCTIONS

@olivhoenen

@olivhoenen

Copy link
Copy Markdown
Contributor

@jwasikpsnc please fix conflict and then it should be good to be merged

…re/binary_data_operation

# Conflicts:
#	backend/ibex/data_source/imas_python_source.py
@jwasikpsnc

Copy link
Copy Markdown
Collaborator Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants