feat(perf): add --ep-options to pass runtime EP provider options (#865) by xieofxie · Pull Request #889 · microsoft/winml-cli

xieofxie · 2026-06-12T09:26:38Z

Summary

Fixes #865.

Adds a repeatable --ep-options KEY=VALUE flag to winml perf that forwards runtime execution-provider options to the inference session. These options (e.g. QNN HTP htp_performance_mode) significantly affect runtime latency independently of the build-time quantization config, so being able to set them at benchmark time is essential for tuning.

The flag works for both input modes:

HuggingFace model IDs (winml perf -m microsoft/resnet-50 --ep-options htp_performance_mode=burst)
Pre-exported ONNX files (winml perf -m model.onnx --ep-options htp_performance_mode=burst)

It is also wired into per-module (--module) benchmarking.

How it works

The options are threaded down to session.py''s add_ep_for_device, which already accepts an ep_options dict:

--ep-options (CLI, repeatable)
  -> parse_ep_options() -> dict
  -> BenchmarkConfig.ep_options
  -> WinMLAutoModel.from_pretrained / from_onnx (provider_options=)
  -> WinMLPreTrainedModel (provider_options=)
  -> WinMLSession (provider_options=)
  -> _build_session_options -> add_ep_for_device(opts, ep, device_type, provider_options)

WinMLSession gains a provider_options kwarg that merges on top of and overrides any build-time ep_config.provider_options, but — unlike passing a full ep_config — does not flip EPContext persistence (persist_jit). These options tune the runtime session, not the compiled graph.

Changes

utils/cli.py: new ep_options_option() decorator and parse_ep_options() helper (reusable by other commands).
session/session.py: WinMLSession accepts provider_options, merged over ep_config options.
models/winml/base.py, models/auto.py: thread provider_options through from_pretrained / from_onnx (including the composite-model and skip-build paths).
commands/perf.py: add --ep-options, parse once, pass to both single-model and --module paths.
docs/commands/perf.md: document the new flag with an example.

Tests

tests/unit/utils/test_cli.py: parse_ep_options (empty/single/multiple/=-in-value/duplicate-key/invalid/empty-key).
tests/unit/commands/test_perf_cli.py: --ep-options forwarded as provider_options for both ONNX and HF paths; CLI parsing into BenchmarkConfig; invalid-format rejection; help text.
tests/unit/session/test_winml_session.py: runtime provider_options forwarded to add_provider_for_devices; runtime options override ep_config options.

All affected suites pass (225 passed; 2 pre-existing OpenVINO-EP-dependent failures unrelated to this change, deselected on this machine where OpenVINO is not installed).

🤖 Generated with Claude Code

Adds a repeatable `--ep-options KEY=VALUE` flag to `winml perf` that forwards runtime execution-provider options (e.g. QNN `htp_performance_mode`) to the inference session via `add_provider_for_devices`, for both HuggingFace model IDs and pre-exported ONNX file inputs. The options are threaded through WinMLAutoModel.from_pretrained/from_onnx -> WinMLPreTrainedModel -> WinMLSession. WinMLSession gains a `provider_options` kwarg that merges on top of (and overrides) any build-time `ep_config.provider_options` without affecting EPContext persistence, so these tune the runtime session rather than the compiled graph. Also wired into per-module (`--module`) benchmarking.

xieofxie requested a review from a team as a code owner June 12, 2026 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(perf): add --ep-options to pass runtime EP provider options (#865)#889

feat(perf): add --ep-options to pass runtime EP provider options (#865)#889
xieofxie wants to merge 1 commit into
mainfrom
hualxie/perf_ep_option

xieofxie commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xieofxie commented Jun 12, 2026

Summary

How it works

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant