Skip to content

fix(perf): support composite (dual-encoder) models in winml perf#866

Open
xieofxie wants to merge 3 commits into
mainfrom
hualxie/fix_siglip
Open

fix(perf): support composite (dual-encoder) models in winml perf#866
xieofxie wants to merge 3 commits into
mainfrom
hualxie/fix_siglip

Conversation

@xieofxie

@xieofxie xieofxie commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Problem

winml perf crashed on composite (dual-encoder) models such as SigLIP/CLIP:

winml perf --ep openvino --device cpu -m google/siglip-base-patch16-224 \
  --task zero-shot-image-classification
...
AttributeError: 'WinMLModelForZeroShotImageClassification' object has no attribute 'io_config'

PerfBenchmark assumed every model exposes a single io_config / _session. Composite
models have neither — they orchestrate multiple sub-models (e.g. an image encoder and a
text encoder), each with its own ONNX session. The failure is device-independent: the
(model_type, task) registry routes SigLIP to the composite class regardless of --device.

Fix

Make PerfBenchmark composite-aware while leaving the single-session path's measurement
semantics untouched:

  • _aggregate_io_config() — unions the sub-models' inputs (deduped by name, order
    preserved). Their union is exactly the composite forward() kwargs, so random-input
    generation and the info display work unchanged.
  • End-to-end timing — composites time the full forward() pass (both encoders + the
    similarity step) via an external PerfStats. Single-session models keep recording
    pure-ORT time inside session.perf(). The monitored loop now takes a run-iteration
    callable so both paths share it.
  • Device / EP / task resolved from a representative sub-model.
  • _probe_composite_outputs() — runs one forward() and introspects the result so the
    reported outputs are the composite's real task-level tensors (e.g. logits_per_image)
    instead of a deduped union of sub-model ONNX outputs. Best-effort: falls back to the
    aggregated view if the probe fails.

The output describer (_describe_outputs) is architecture-agnostic (handles HF
ModelOutput / dict / sequence / single tensor) — no model-specific field names.

Result

Device:      cpu / OpenVINOExecutionProvider
Task:        zero-shot-image-classification
Inputs:      pixel_values   [1, 3, 224, 224]   float32
             input_ids      [1, 64]            int32
Outputs:     logits_per_image     [1, 1]
             logits_per_text      [1, 1]
             text_embeds          [1, 768]
             image_embeds         [1, 768]

Tests

tests/unit/commands/test_perf_composite.py (new, 15 cases) covers io_config aggregation,
the output describer/probe, input generation, device/EP/task resolution, and the
full-forward() timing path. Existing test_perf_cli.py / test_perf_module.py (31 cases)
still pass — no regression.

🤖 Generated with Claude Code

`winml perf` assumed every model exposes a single `io_config`/`_session`,
so composite models (CLIP/SigLIP zero-shot-image-classification) crashed
with `AttributeError: ... has no attribute io_config` during input
generation.

Make `PerfBenchmark` composite-aware:
- `_aggregate_io_config()` unions the sub-models inputs (their union is
  exactly the composite forward() kwargs) for input generation/display.
- Time the full `forward()` pass via an external PerfStats; single-session
  models keep recording pure-ORT time inside session.perf(). The monitored
  loop is refactored to take a run-iteration callable so both paths share it.
- Device/EP/task are resolved from a representative sub-model.
- `_probe_composite_outputs()` runs one forward() and introspects the result
  so reported outputs are the composite task-level tensors (e.g.
  logits_per_image) rather than a deduped union of sub-model ONNX outputs.

Add tests/unit/commands/test_perf_composite.py covering aggregation, output
describing/probing, input generation, device/EP/task resolution, and the
full-forward timing path.
@xieofxie xieofxie requested a review from a team as a code owner June 10, 2026 09:21
from collections.abc import Callable, Iterable

from ..models.winml.base import WinMLPreTrainedModel
from ..models.winml.composite_model import WinMLCompositeModel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants