Skip to content

feat(perf): add --ep-options to pass runtime EP provider options (#865)#889

Open
xieofxie wants to merge 1 commit into
mainfrom
hualxie/perf_ep_option
Open

feat(perf): add --ep-options to pass runtime EP provider options (#865)#889
xieofxie wants to merge 1 commit into
mainfrom
hualxie/perf_ep_option

Conversation

@xieofxie

Copy link
Copy Markdown
Contributor

Summary

Fixes #865.

Adds a repeatable --ep-options KEY=VALUE flag to winml perf that forwards runtime execution-provider options to the inference session. These options (e.g. QNN HTP htp_performance_mode) significantly affect runtime latency independently of the build-time quantization config, so being able to set them at benchmark time is essential for tuning.

The flag works for both input modes:

  • HuggingFace model IDs (winml perf -m microsoft/resnet-50 --ep-options htp_performance_mode=burst)
  • Pre-exported ONNX files (winml perf -m model.onnx --ep-options htp_performance_mode=burst)

It is also wired into per-module (--module) benchmarking.

How it works

The options are threaded down to session.py''s add_ep_for_device, which already accepts an ep_options dict:

--ep-options (CLI, repeatable)
  -> parse_ep_options() -> dict
  -> BenchmarkConfig.ep_options
  -> WinMLAutoModel.from_pretrained / from_onnx (provider_options=)
  -> WinMLPreTrainedModel (provider_options=)
  -> WinMLSession (provider_options=)
  -> _build_session_options -> add_ep_for_device(opts, ep, device_type, provider_options)

WinMLSession gains a provider_options kwarg that merges on top of and overrides any build-time ep_config.provider_options, but — unlike passing a full ep_config — does not flip EPContext persistence (persist_jit). These options tune the runtime session, not the compiled graph.

Changes

  • utils/cli.py: new ep_options_option() decorator and parse_ep_options() helper (reusable by other commands).
  • session/session.py: WinMLSession accepts provider_options, merged over ep_config options.
  • models/winml/base.py, models/auto.py: thread provider_options through from_pretrained / from_onnx (including the composite-model and skip-build paths).
  • commands/perf.py: add --ep-options, parse once, pass to both single-model and --module paths.
  • docs/commands/perf.md: document the new flag with an example.

Tests

  • tests/unit/utils/test_cli.py: parse_ep_options (empty/single/multiple/=-in-value/duplicate-key/invalid/empty-key).
  • tests/unit/commands/test_perf_cli.py: --ep-options forwarded as provider_options for both ONNX and HF paths; CLI parsing into BenchmarkConfig; invalid-format rejection; help text.
  • tests/unit/session/test_winml_session.py: runtime provider_options forwarded to add_provider_for_devices; runtime options override ep_config options.

All affected suites pass (225 passed; 2 pre-existing OpenVINO-EP-dependent failures unrelated to this change, deselected on this machine where OpenVINO is not installed).

🤖 Generated with Claude Code

Adds a repeatable `--ep-options KEY=VALUE` flag to `winml perf` that forwards
runtime execution-provider options (e.g. QNN `htp_performance_mode`) to the
inference session via `add_provider_for_devices`, for both HuggingFace model
IDs and pre-exported ONNX file inputs.

The options are threaded through WinMLAutoModel.from_pretrained/from_onnx ->
WinMLPreTrainedModel -> WinMLSession. WinMLSession gains a `provider_options`
kwarg that merges on top of (and overrides) any build-time
`ep_config.provider_options` without affecting EPContext persistence, so these
tune the runtime session rather than the compiled graph. Also wired into
per-module (`--module`) benchmarking.
@xieofxie xieofxie requested a review from a team as a code owner June 12, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add --ep-options to winml perf (and other commands) for runtime EP provider options

1 participant