docs: add user-facing documentation site by hi-brenda · Pull Request #884 · microsoft/winml-cli

hi-brenda · 2026-06-12T07:53:36Z

Summary

Adds a full MkDocs Material documentation site covering all commands, concepts, tutorials, and reference pages
Fixes factual inaccuracies found during source review (flags, defaults, EP behavior, pipeline steps)
Simplifies quickstart by removing redundant admonitions and notes

Adds a complete MkDocs Material documentation site for the winml-cli project, served from /docs and built locally and via GitHub Actions (manual dispatch). Site infrastructure: - mkdocs.yml with Material theme, mermaid superfences, tabbed code, light/dark palette toggle - pyproject.toml dev deps: mkdocs-material, mkdocs-jupyter, pymdown-extensions - .github/workflows/docs.yml (workflow_dispatch only) - .gitignore exception for docs/superpowers/specs/ User-facing chapters: Home — tagline + Goals/Promises bullets sourced from the MVP transcript; describes the toolkit's three workflows (primitives, pipeline, one-command) plus the EP × Device coverage promise Getting Started (3 pages): - Installation — Win 11 24H2 + Copilot+PC + Python 3.10 + uv + git prereqs table; 'No NPU?' callout pointing at --device auto with the winml eval caveat - Quickstart — 5-minute export + inspect with 'winml sys --list-device --list-ep' verify step - End-to-End Tour — universal --device auto walkthrough that works on Copilot+ PC NPU, DirectML GPU, or CPU; tabbed example outputs for sys and perf so each reader sees their own machine Concepts (12 pages in two sub-groups): - Fundamentals (5): How winml-cli works, Graph and IR, Weight and Activation, EP and Device (with the full 7-EP × Device matrix), Datatype and Quantization (8-precision family from _KNOWN_PRECISIONS with w4a16 marked 'Planned — not yet supported') - WinML CLI (7 workflow-concept pages): Primitives and pipeline, Load and export, Analyze and optimize, Compile and EPContext, Perf and monitoring, Eval and datasets, Config and build (with the full WinMLBuildConfig schema inline) Commands (13 pages): - Overview with the four user-intent groups (Discover / Configure / Build / Measure) - Per-command reference for: sys, inspect, hub, analyze, config, optimize, export, quantize, compile, build, perf, eval Samples (3 pages): - ConvNeXt — Primitives Walkthrough (CPU/GPU/NPU device comparison) - BERT — Config + Build + Perf (workflow demonstration) - Qwen3 — Composite Models (placeholder for the in-progress feature) Tutorials (2 pages): - Overview - ConvNeXt on NPU — 2200-word linear walkthrough with both QNN and OpenVINO compile paths shown via tabbed blocks, plus the 'winml build' one-shot variant P2 stubs preserved in nav: Reference, Troubleshooting, Contributing Source-grounding: - Every flag mentioned in user-facing docs is verified against src/winml/modelkit/ - Non-functional flags (--torch-module, --dynamo on export; --no-quant on compile) are explicitly marked - All URLs target the canonical microsoft/winml-cli destination - mkdocs build --strict passes with zero warnings Internal artifacts kept under docs/superpowers/ for reference: - Spec and plan files for the v1 and v2 design iterations - 2026-05-26-v3-known-issues.md — fact-checked review findings Existing internal docs (docs/design/, docs/naming-convention.md, docs/pytest-best-practices.md) are unchanged and excluded from the user-facing nav via exclude_docs in mkdocs.yml.

…he site Adds a contributor-facing README at docs/README.md covering: - uv-based dev setup - mkdocs serve / build --strict workflow - gh-deploy publish (local one-shot) - .github/workflows/docs.yml CI workflow (currently workflow_dispatch only) - Authoring conventions (winml-cli name, flag verification, admonitions, tabbed code blocks) - Excluded paths reference Updates mkdocs.yml exclude_docs to include /README.md so the new file doesn't collide with docs/index.md as the chapter index.

…source Six parallel review agents fact-checked all 34 user-facing doc files against microsoft/winml-cli @ 5e25579. Output: one issue file per source doc at docs/superpowers/2026-05-27-doc-issues/. A validator agent then cross-checked every Critical and Important claim and produced the consolidated, false-positive-filtered list at docs/superpowers/2026-05-27-validated-issues.md. Summary: 25 Critical + 22 Important kept; 6 rejected as false positives. Major theme: docs were authored against feat/mvp source where some symbols and defaults differ from main (e.g., _KNOWN_PRECISIONS in _options.py vs _NAMED_PRECISIONS in precision.py; winml hub vs winml catalog; many flag defaults flipped to 'auto'; DML/CPU no longer produce _ctx.onnx artifacts). Next step: per-file fix agents will apply the validated list.

…eview 5 parallel fix agents applied the validated-issues list. Net: 25 Critical + 22 Important defects resolved across 20 doc files + mkdocs.yml. Major fixes by area: Concepts (4 pages): - quantization.md: NPU auto-precision corrected to w8a16 (was int8); w4a16 description corrected (rejected at validation, not 'recognized but raises at quantization'); _KNOWN_PRECISIONS/_options.py references replaced with the actual _NAMED_PRECISIONS/precision.py - compile-and-epcontext.md: removed non-existent --no-quant flag mention - config-and-build.md: JSON 'compile' section flattened to use execution_provider (not nested ep_config.provider); table expanded to the actual 7 sub-configs (added eval, auto) - perf-and-monitoring.md: --device documented as accepting auto; output path corrected to ~/.cache/winml/perf/<slug>/<timestamp>.json; --monitor not NPU-specific; --op-tracing marked hidden Commands (11 pages): - overview.md: winml hub renamed to winml catalog throughout; _options.py reference replaced with cli.py - hub.md: H1 and all invocations changed to 'winml catalog'; removed non-existent --model/-m flag; rewrote 'How it works' (no per-EP latency / accuracy-verdict columns exist); added --ep/--device filter flags - build.md: --config marked optional (was required); --random-init and --qnn-sdk-root removed (don't exist); --no-compile/--compile toggle pair documented; --trust-remote-code added; --max-optim-iterations default corrected to None - compile.md: --device default corrected to auto; --no-quant flag removed (doesn't exist on compile) - config.md: --no-compile/--compile framing corrected (compile is EXCLUDED by default; users need --compile to include) - eval.md: --device includes auto (default auto, not cpu); -n short alias removed; class reference replaced with actual evaluate function - analyze.md: --device default corrected to auto; --ep default to auto; --run-unknown-op default to False; -m/-v/-q/-c flags added - optimize.md: --preset/-p flag and entire Built-in presets table removed (flag doesn't exist); --verbose added; 'Configuration precedence' reduced from 4 levels to 3 - inspect.md: --list-tasks, --model-type, --model-class, --verbose flags added - perf.md: --compare-devices removed (not registered at all); output path corrected; --op-tracing marked hidden - sys.md: --verbose/-v added to flag table Samples / Tutorials / Getting Started (5 pages): - installation.md: Python 3.10 corrected to 3.11; 'No NPU?' callout no longer claims winml eval rejects auto (it accepts auto on main) - end-to-end.md: dropped incorrect _ctx.onnx CPU/DML artifacts; QNNExecutionProvider mapped to NPU/GPU (not just NPU) - convnext-primitives.md: CPU/GPU compile clarified (no _ctx.onnx produced; uses convnext_int8.onnx directly); winml eval auto reverted - bert-config-build.md: build final artifact corrected to model.onnx (was bert-base-uncased_ctx.onnx) - npu-convnext.md: Python 3.10 -> 3.11; OpenVINO artifact filename corrected to use device string (_npu_ctx.onnx not _openvino_ctx.onnx); CPU compile tab dropped (CPU doesn't produce _ctx.onnx) mkdocs.yml: nav label 'hub' renamed to 'catalog' to match the actual command name on microsoft/winml-cli main.

…meration) The opening paragraph re-stated the project tagline (already on the home page one click above) and enumerated 4 EPs (QNN, OpenVINO, DML, ONNX Runtime) — which goes stale; the canonical list in concepts/eps-and-devices.md has 7. Removing the paragraph; the page now starts with the Prereqs table. Matches the convention used by quickstart.md and end-to-end.md (neither re-states the tagline).

## Summary - Rewrote `docs/concepts/analyze-and-optimize.md` with source-verified content: SupportLevel classification table, lint vs autoconf outputs, analysis modes, optimizer pipe architecture (4 pipes, 43 capabilities, 5 rewrite groups / 12 rules), and autoconf loop SVG diagram - Updated `docs/commands/analyze.md` with corrected EP aliases, exit-code table, and additional CLI examples - Renamed `hub.md` → `catalog.md` and updated all cross-references (inspect, overview, sys, mkdocs.yml) - Fixed `check-yaml` pre-commit hook to support `!!python/name` tags in mkdocs.yml (`--unsafe`) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Zhipeng Wang <zhiwang@microsoft.com> Co-authored-by: Qiong Wu (qiowu) <qiowu@microsoft.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Charles Zhang <zhangchao@microsoft.com> Co-authored-by: Zhenchao Ni <zhenni@microsoft.com>

only unit test _skip_winml_ep_init

## Summary - Drop the `WindowsAppRuntimeVersion` class, attribute, property, and `windowsAppRuntimeVersion` field in `SysInfo.to_dict()` from `src/winml/modelkit/sysinfo/sysinfo.py`. - Remove the now-unused `import re`. Nothing else in the codebase referenced these symbols. Integration `runtime_checker` fixtures still contain the field inside their stored `sys_info` blob, but the test helper ignores `sys_info` during comparison, and the field will disappear naturally next time those fixtures are regenerated.

…763) ## Summary - **VitisAI EP ordering**: Move `VitisAIExecutionProvider` to end of `EP_SUPPORTED_DEVICES` so it appears last in `analyze --ep all` output, since it is not yet fully supported. - **Catalog table width**: Set `expand=False` on both `Table` and `Panel` in `_build_list_renderable` so the catalog table fits its content width instead of stretching to the full terminal width.

…tection (#779) Also update scripts/e2e_eval/run_pytorch_baseline.py to include pytorch model latency --------- Co-authored-by: hualxie <hualxie@microsoft.com>

## Summary - Reorganized README into 5 sections: Title + Description, Features / Scope, Getting Started, Commands, Contributing + License - Updated status badge to `preview`, rewrote description and Features (✅ bullets) - Scope section: added supported EPs, built-in model catalog reference, accepted inputs; removed verbose LLM/not-supported block - Getting Started: consolidated Prerequisites + Installation + Quick Start; added Config-Build Pipeline and Step-by-step through primitive commands walkthroughs - Commands: BYOM workflow with pipeline diagram, command table + collapsible details, comparison table (Config-Driven first) - Reference tables at end: Supported Hardware, Supported Tasks, Supported Model Types, Built-in Models --------- Co-authored-by: Qiong Wu (qiowu) <qiowu@microsoft.com> Co-authored-by: Zhipeng Wang <zhiwang@microsoft.com>

## Summary - Removed the duplicated `WinML CLI (Python wheel) | [Releases]` row in the Prerequisites table. - Updated the install step from `uv pip install winml_cli-<version>-py3-none-any.whl` to `pip install winml-cli`. - Updated the Prerequisites entry to point at PyPI instead of GitHub Releases, keeping the table and install instructions consistent.

## Summary - Adds `resolve_check_device_ep` helper that validates a (device, EP) combination without requiring the device/EP to actually exist on the system. Closes #765. - `commands/config.py` and `config/build.py` now use `resolve_check_device_ep` instead of `resolve_device` so `winml config` no longer hard-fails on hosts where the requested EP isn't installed. - When `device=auto` or `ep=None`, the helper delegates to the existing `resolve_device` + `resolve_eps` flow (system-aware behavior preserved). When both `device` and `ep` are explicit, it only validates against the static `EP_SUPPORTED_DEVICES` mapping. - CLI cleanup: `-m/--model`, `-c/--config`, `--device` for the config command now use the shared `cli_utils.*_option` decorators. ## Tests - New `TestResolveCheckDeviceEp` class in `tests/unit/sysinfo/test_device.py` covering both code paths (delegation and static-only) plus error cases (unknown EP, unsupported device, case-insensitivity). - Existing config-test mocks updated from `resolve_device` to `resolve_check_device_ep` (`tests/unit/config/conftest.py`, `tests/unit/config/test_build.py`, `tests/unit/config/test_build_onnx.py`, `tests/unit/commands/test_config_cli.py`) so the lazy import in `config/build.py` is intercepted. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: hualxie <hualxie@microsoft.com>

Co-authored-by: hualxie <hualxie@microsoft.com>

…gs) (#785) Adds curated recipe configs for the 12 builtin models — those that pass fp16 eval on all 9 (EP, device) buckets.

## Summary Fixes `scripts/e2e_eval/run_eval.py` crashing on VitisAI EP (AMD Ryzen AI NPU) and a latent bug in `winml build` that prevented the script's `--no-quant` workaround from actually taking effect. The crash: VitisAI ships its own internal quantizer and runs it at session-create time. Layering winml's generic QDQ quantization pass on top produces a model VitisAI cannot consume, which manifests as `DpuKernelRunner.cpp:1920 DPU timeout` during `winml perf`. The fix is to tell winml to skip its own quantization when the selected EP quantizes natively. ## Changes ### `src/winml/modelkit/commands/build.py` — root-cause fix (1 line) When `--device` was passed to `winml build`, the internal `_patch_device` helper unconditionally re-populated `cfg.quant` with the device's default quantization config, silently undoing any prior `--no-quant`. The condition now respects `no_quant`: ```python if no_quant or resolved_quant is None: cfg.quant = None ``` Without this, `winml build … --device npu --no-quant` still produced a `_quantized.onnx` artifact. ### `scripts/e2e_eval/run_eval.py` — script wiring - New canonical-name set `_NATIVE_QUANT_EPS = {"VitisAIExecutionProvider"}` plus a helper `_ep_quantizes_natively(ep)` that funnels both canonical names and user aliases (e.g. `vitisai`) through `winml.modelkit.utils.constants.normalize_ep_name`. No hardcoded aliases. - `_resolve_precision(...)` gained an `ep` parameter; for native-quant EPs it returns `None` so no precision flag is sent. - `_run_build` now passes `--no-quant` to **both** `winml config` (so the persisted `build_config.json` has `quant: null` up-front) and `winml build` (defense in depth) when the EP quantizes natively. - Call sites in `run_model` and `main` updated to thread `ep` through `_resolve_precision`. ## Why the earlier commits in this branch weren't enough The first attempt (`fix(run_eval): skip quantize when VitisAI EP is selected`) wired `--no-quant` only into `winml build`. That didn't take effect because of the `_patch_device` bug above. The second attempt (`fix(vitisai): resolve auto-precision to w8a8 for VitisAI NPU`) tried to switch precision instead of skipping — also wrong, since VitisAI wants an fp32 input and quantizes it itself. The final state keeps the script clean (`--no-quant`, no precision override) and fixes the actual `winml build` bug. ## Verification Manual end-to-end on AMD Ryzen AI (VitisAI NPU), with a clean `~/.cache/winml/artifacts/...` and output dir: ```pwsh uv run --no-sync python scripts/e2e_eval/run_eval.py ` --hf-model facebook/convnext-tiny-224 ` --task image-classification ` --device npu --ep vitisai ` --eval-type perf --no-report --verbose --timeout 1800 ` --output-dir e2e-test\vitisai_npu ``` Before: `winml perf` crashed with `DpuKernelRunner.cpp:1920 DPU timeout`. After: - Cached `imgcls_*_winml_build_config.json` has `"quant": null`. - No `_quantized.onnx` artifact produced. - Perf step: **PASS** in ~120 s.

…771) ## Summary Closes #546. `winml inspect --task bogus-task` was leaking optimum's internal `TasksManager` class name and pointing users to optimum docs: > Error: Inspection error: Task 'bogus-task' not supported by TasksManager. Check optimum documentation for supported tasks. Now the value is validated at Click parse time against the hand-coded `KNOWN_TASKS` set, before any heavy imports: ``` $ winml inspect -m microsoft/resnet-50 --task bogus-task Usage: winml inspect [OPTIONS] Try 'winml inspect --help' for help. Error: Invalid task 'bogus-task'. Valid: audio-classification, audio-frame-classification, audio-xvector, automatic-speech-recognition, depth-estimation, ... (35 total). See 'winml inspect --list-tasks' for the full list. ``` - Exit code 2 (Click UsageError) - No third-party class names; no optimum-docs pointer - Callback imports only `..loader.task.KNOWN_TASKS` — avoids the ~10s optimum/transformers cold start, so the fail-fast stays fast - `--list-tasks` and valid `--task` paths unchanged Co-authored-by: Ziyuan Guo (WE TEAM) <ziyuanguo@microsoft.com>

…#772) ## Summary Fixes #541. `winml catalog` was the only command where `-t` did NOT mean `--task`: | Command | `-t` means | |-----------|------------------| | `inspect` | `--task` | | `export` | `--task` | | `config` | `--task` | | `catalog` | `--model-type` (inconsistent) | A user who has memorized `-t` to mean `--task` in 3 commands would type `-t image-classification` against `winml catalog` and silently get `--model-type=image-classification` (no such model type) instead. ## Change In `src/winml/modelkit/commands/catalog.py`: - Dropped the `-t` short from `--model-type` (no short alias now). - Moved `-t` to `--task` (replacing the previous `-k`). `--model-type` is still fully supported via its long form. Adds a regression guard test (`test_model_type_has_no_short_flag`) that checks both the `--help` output AND that passing a model_type via `-t` is interpreted as a task. All 115 catalog tests pass. Co-authored-by: Ziyuan Guo (WE TEAM) <ziyuanguo@microsoft.com>

**Skips compilation related cases** There are some model fail to be compiled in VitisAI Execution Provider. The error is an "Access Violation" error which causes the python process to crash. This would be an EP side problem. To unblock our e2e test, I have skipped them for VitisAI **Skips npu usage assertion for small model** Running small mock model can be super fast. For this case, the NPU usage is zero. However, our assertion logic still expectes to have some NPU usage. This makes the e2e not stable. Considering that we have already this assertion on real model e2e test cases, I skip this assertion for small model only. **Skips eval metric value range assertion** The eval e2e test only uses 10 samples because we aim to see the eval pipeline is working rather than truly eval a model in e2e. In assertion logic, we have a metric range. But the metric range is calcuated on qnn device, which may not be the same for other devices. Using the same range may cause e2e instable. Therefore, I only assert the metric range for qnn. For other device, I just assert the metric value is available.

uv run ~\ModelKit\examples\microsoft-swin-large-patch4-window7-224\example.py --onnx ~\.cache\winml\artifacts\microsoft_swin-large-patch4-window7-224\imgcls_ec485f4653d962b9_quantized.onnx True label: house finch, linnet, Carpodacus mexicanus (synset=n01532829, id=12) Top 5 predictions: 1. house finch, linnet, Carpodacus mexicanus (0.9127) 2. brambling, Fringilla montifringilla (0.0122) 3. goldfinch, Carduelis carduelis (0.0028) 4. chickadee (0.0013) 5. junco, snowbird (0.0013) Verdict (top-1): PASS Annotated image written to prediction.png --------- Co-authored-by: hualxie <hualxie@microsoft.com>

…ng (#790) ## Summary timm checkpoints load through transformers'' generic `TimmWrapper` (`model_type="timm_wrapper"`) and previously failed in **every** `winml` command with *"Cannot detect task: config has no ''architectures'' field"*. Two gaps: 1. **Task/class detection** — timm repos load as `TimmWrapperConfig` with `architectures=None`, so auto-detection could not resolve a task or class. 2. **OnnxConfig location** — Optimum registers timm''s config (`TimmDefaultOnnxConfig`) only under `library_name="timm"`, but every `winml` lookup defaults to `transformers`. `timm_wrapper` is transformers'' generic bridge for the whole timm library — not a model architecture — so it is resolved at the **shared resolution layer**, not as a per-model config. Only the library is recorded; the task is derived from Optimum. ## Changes (no `models/hf/` entry) - **`loader/task.py`** — `WRAPPED_LIBRARY_MODEL_TYPES` (`model_type -> optimum_library`) + `resolve_optimum_library()`. When a config has no `architectures`, `_detect_task_and_class_from_config` derives the task from Optimum''s task list for the library (`get_supported_tasks("timm_wrapper", "timm")` -> `["image-classification"]`) and the class from `get_model_class_for_task` (generic `AutoModelForImageClassification`, which transformers dispatches to `TimmWrapper` at load). The task is not hardcoded; the branch imports `optimum.exporters.onnx.model_configs` first to populate Optimum''s registry (scoped so normal model loading never pays for it). - **`export/io.py`** — `_get_onnx_config` routes the library via `resolve_optimum_library`, so `timm_wrapper` resolves Optimum''s `TimmDefaultOnnxConfig` from every call site (config/build/export/inspect) with no `--library` flag. - **`commands/inspect.py`** + **`inspect/resolver.py`** — route both the CLI inspect path and the public `inspect_model` path the same way: library routing for the OnnxConfig lookup, plus wrapped-library task detection so the task is not mislabeled. - Tests: `resolve_optimum_library` + wrapped-library architectures fallback with task derivation (loader); timm library routing for `resolve_io_specs` / `_get_onnx_config` (export); public inspect path `detect_task` / `resolve_exporter` for timm (inspect). ## Validation **Functional (end-to-end)** on a timm image-classification model: | Command | Before | After | |---|---|---| | `winml config` | exit 2 — *no ''architectures'' field* | task=image-classification, 1 input | | `winml export` | exit 2 — same | `model.onnx` (pixel_values to logits) | | `winml inspect` | exit 1 — same | `AutoModelForImageClassification` + `TimmDefaultOnnxConfig`, full I/O table | `config` -> `export` -> `optimize` -> `model.onnx` validated end-to-end for multiple timm CNN classifiers. Also resolves on a timm ViT backbone (`num_labels=0`) -> task=image-classification, matching Optimum''s own `infer_task_from_model`, so it generalizes across timm architectures (CNN + ViT). **No impact on existing models** — scanned all 439 entries / 401 unique models in `scripts/e2e_eval/testsets/models_all.json`: **0** are `timm_wrapper` (by JSON metadata and by loaded config; 330 loadable). Since `timm_wrapper` is the only trigger of the new branch, no existing model changes behavior. (71 fail to load a config — custom/GGUF/tabular types that fail at `AutoConfig` regardless; 7 have empty `architectures` but are not timm — a pre-existing "Cannot detect task", identical before and after the PR.) **No overhead for normal (non-timm) models** — `winml config` on a standard non-timm model: this branch vs base, min ~12.6s vs ~12.5s (within run-to-run noise). Non-timm configs have `architectures`, so they skip the new branch; the only added cost is one dict lookup. **Unit tests** — `tests/unit/loader` + `tests/unit/export` + `tests/unit/inspect`: green. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Yi Ren <reny@microsoft.com>

## Fix model-task inconsistency for vision feature-extraction models Fixes #777, #778, #782. ### Principle `winml inspect` is the source of truth for valid `(model_id, task)` pairs. Both `feature-extraction` and `image-feature-extraction` are valid ways to address an image-embedding model like `facebook/dinov2-base`. Downstream commands must accept whichever name `winml inspect` accepts, then use `(model_id, task)` to locate the concrete class to act on. ### Root cause Optimum's `TasksManager.get_exporter_config_constructor` only knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases like `image-feature-extraction` were rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks like `feature-extraction`. ### Fix - **Inspect / export / HTP exporter**: normalize via `_map_task_synonym(task)` (in `export/io.py`) before any `TasksManager` lookup because it requires normalized task input. This is a single function reused at each `TasksManager` boundary — no new global table. - **Quantize**: `_resolve_dataset_class(task, io_config)` in `datasets/__init__.py` dispatches to `TextDataset` / `ImageDataset` based on the actual ONNX input names. No `AutoConfig.from_pretrained` round-trip. Bimodal io_configs fall back to `RandomDataset` with a warning. - **Evaluate**: Because HF pipeline and evaluate library have their task name convention, `to_hf_pipeline_task(task, model_id)` in `eval/evaluate.py` translates to the HF pipeline name the underlying `evaluate` library expects. Uses `OnnxConfig.inputs` (no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: both `pixel_values` and `input_ids`) keep the task unchanged via a `len(hits) == 1` guard, preserving the explicit user task. ### Validation `facebook/dinov2-base`: | Command | Before | After | |---|---|---| | `winml inspect -m facebook/dinov2-base --task image-feature-extraction` | "Unsupported" | Resolves via `Dinov2OnnxConfig` | | `winml export -m facebook/dinov2-base -t image-feature-extraction` | KeyError on TasksManager | Valid ONNX with `last_hidden_state` | | `winml eval -m facebook/dinov2-base --task feature-extraction` | `RuntimeError: Failed to create feature-extraction dataset` | kNN metrics on mini-imagenet | | `winml quantize <onnx> --task feature-extraction -m facebook/dinov2-small` | Failure by using TextDataset | Routes to `ImageDataset` | `openai/clip-vit-base-patch32` (bimodal, regression check): - `winml eval -m openai/clip-vit-base-patch32 --task feature-extraction` → stays `feature-extraction` (text STS evaluator); not silently rerouted to image. - `winml eval -m openai/clip-vit-base-patch32` (auto-detect) → resolves to `feature-extraction` (text). ### Tests Unit: - `tests/unit/eval/test_eval.py::TestResolveTask` — auto-detect, explicit task, bimodal guard, HF pipeline translation. - test_random_dataset.py — `TASK_DATASET_MAPPING` covers all registered tasks, including bimodal dict-of-dict. E2E (`-m e2e`, dinov2 chosen because it isn't in `MODEL_BUILD_CONFIGS` and so actually exercises the `TasksManager` path): - `tests/e2e/test_inspect_e2e.py::TestInspectDinoV2` — both `image-feature-extraction` and `feature-extraction` resolve. - `tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction`. - `tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction` parameterized over both task names. - `tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset`.

Table for stub ``` ┌──────────────┬──────────┬────────────────────────────────┬────────────────────────────────────────────┐ │ Lib │ py.typed │ Reality │ Override status │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ torch │ yes │ Has inline types (v2.11) │ Override is a no-op — mypy uses real types │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ torchvision │ no │ No types, no community stubs │ Genuinely needed │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ onnx │ yes │ Has inline types (v1.18) │ Override is a no-op │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ onnxruntime │ no │ Untyped; no community stubs │ Genuinely needed │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ transformers │ yes │ Inline types but partial/loose │ Override is a no-op — types ARE used │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ datasets │ no │ Untyped │ Genuinely needed │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ optimum │ no │ Untyped │ Genuinely needed │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ timm │ yes │ Has inline types (v1.0.26) │ Override is a no-op │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ onnxscript │ yes │ Has inline types (v0.7) │ Override is a no-op │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ snakemd │ no │ Untyped │ Genuinely needed │ ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤ │ openvino │ n/a │ Not installed locally │ n/a │ └──────────────┴──────────┴────────────────────────────────┴────────────────────────────────────────────┘ ``` plotext added to ignore_missing_imports (no community stubs, untyped library) --------- Co-authored-by: Hualiang Xie <hualxie@microsoft.com>

…m_task (#801) ## What PR1 of #800. Relocate `map_task_synonym` -> `loader/task.py::to_optimum_task` to establish a single WinML->Optimum task-collapse boundary. ## Changes - `loader/task.py`: add `to_optimum_task` + `TASK_SYNONYM_EXTENSIONS` (moved from `export/io.py`); exported via `loader/__init__.py`. - `export/io.py`: local implementation removed; `map_task_synonym` kept as a backward-compatible alias (`= to_optimum_task`); internal use repointed. - Optimum-boundary call sites repointed to `to_optimum_task`: `commands/inspect.py`, `export/htp/exporter.py`, `inspect/resolver.py`. - `commands/build.py`: `TASK_SYNONYM_EXTENSIONS` now imported from `loader`. - New `tests/unit/loader/test_task_boundary.py` pins the collapse contract. ## Behavior No behavior change. `map_task_synonym` stays importable from `export.io`; the collapse semantics (`image-feature-extraction` -> `feature-extraction`, WinML extensions preserved) are byte-identical. Existing synonym and #777/#782 regression tests stay green. Sets up PR2 (#800), which adds the modality-aware `detect_task` and relies on this single collapse boundary.

…#793) Fixes #566. ## Problem - Top-level group declared ``-v/--verbose`` (count) and ``-q/--quiet``, but 12 of 13 subcommands redeclared ``--verbose`` as ``is_flag=True``, so ``winml export -vv …`` errored with ``extra argument``. - No subcommand exposed ``-q/--quiet``, so ``winml export --quiet …`` failed with ``no such option``. - Each command wired logging differently; DEBUG/INFO lines interleaved with Rich tables on stdout, breaking ``cmd > out 2> log.txt``. ## Changes - ``utils/cli.py``: ``verbosity_options`` decorator (``-v`` count, ``-q`` flag) + new ``resolve_verbosity(ctx, verbose, quiet)`` helper that merges top-level and subcommand-level values (max of verbose, OR of quiet). Honors the legacy ``ctx.obj[""debug""]`` so tests that bypass ``main()`` still raise the verbosity floor. - ``utils/logging.py``: format ``[%(asctime)s %(levelname)-7s %(name)s] %(message)s`` with ``datefmt=%H:%M:%S``, ``stream=sys.stderr``. Idempotent — re-creates the WinML handler bound to the current ``sys.stderr`` on each call so Click ``CliRunner`` stream redirection keeps working, and leaves non-WinML handlers (notably pytest ``caplog``) intact. - ``cli.py``: top-level group uses ``@verbosity_options`` (replaces inline declarations); ``--debug`` alias preserved. - 12 subcommands (``build``, ``compile``, ``config``, ``eval``, ``export``, ``inspect``, ``optimize``, ``perf``, ``quantize``, ``sys``, plus ``analyze`` cleanup): replace ad-hoc ``--verbose`` (``is_flag=True``) with ``@cli_utils.verbosity_options``, add ``quiet: bool`` param, call ``configure_logging(verbosity=verbose, quiet=quiet)`` after ``resolve_verbosity``. Removes the legacy ``if ctx.obj.get(""debug""): verbose = True`` blocks (folded into the helper). - ``serve/app.py``: pre-existing latent bug — module-level ``logging.getLogger(""winml.modelkit"").setLevel(INFO)`` ran at import, which muted DEBUG capture in unrelated tests that got collected alongside the serve test module. Split into ``_attach_log_handler()`` (idempotent, called from ``_register_routes``) and a paired ``_ensure_log_capture_level`` / ``_restore_log_capture_level`` invoked from the production lifespan. Tests that build the app via ``_register_routes`` + a mock lifespan no longer leak global logger state. ## Behavior Both flag positions work; subcommand value wins when both are passed (max/OR merge): ```text winml -v export -m … -o … # top-level: works winml export -vv -m … -o … # subcommand: now works (was: extra argument) winml --quiet export -m … -o … # top-level: works winml export --quiet -m … -o … # subcommand: now works (was: no such option) winml inspect -vv -m … --format json > out 2> log.txt # clean stdout/stderr split ``` ## Tests - ``tests/cli/`` (23): pass - ``tests/unit/`` (5061 collected): **5058 pass**, 3 fail — all 3 pre-existing on main and unrelated to this change: - ``test_winml_session.py::TestOpenVINODeviceRouting::test_compile_openvino_cpu_device_succeeds`` - ``test_winml_session.py::TestOpenVINODeviceRouting::test_compile_openvino_cpu_provider_not_npu`` (both env: no OpenVINO EP installed) - ``test_config_utils.py::TestMergeConfigNoneHandling::test_none_to_value_transition`` (test isolation, passes alone) --------- Co-authored-by: hualxie <hualxie@microsoft.com>

## Summary - Replace hardcoded 4-EP list in `analyze_from_proto(ep=None)` with dynamic lookup from `EP_SUPPORTED_DEVICES`, filtered by target device - Remove `max_length=4` constraint on `AnalysisOutput.results` to support more than 4 EPs per device - Change uniqueness validator from IHV type to EP type (multiple EPs can share the same IHV, e.g. CUDA and DML both map to MICROSOFT) **Before:** `analyzer.analyze(ep=None)` always analyzed QNN, OpenVINO, VitisAI, NvTensorRTRTX regardless of device — NvTensorRTRTX was analyzed on NPU even though it only supports GPU. **After:** EP list is derived from `EP_SUPPORTED_DEVICES` filtered by the target device, matching the CLI `--ep all` behavior exactly.

Resolves #326. Adds `WinMLDepthEstimationEvaluator` and `DepthMetric` (Absolute Relative error, RMSE, delta-1) following the NYU/KITTI evaluation protocol. HuggingFace `evaluate` doesn't ship a depth-estimation evaluator, so the metric loop is implemented manually. ### Background Depth-estimation models fall into a few groups, and the same input image gives wildly different prediction scales depending on which group the model belongs to. - Metric-depth models (ZoeDepth, DepthPro) predict depth in meters directly. - Relative-depth models (Depth-Anything, Marigold) predict depth up to an unknown scale and shift. - Disparity models (DPT, MiDaS) predict `1 / depth` (inverse depth) up to scale and shift. Comparing predictions against the NYU ground truth therefore requires (1) optionally inverting disparity into depth and (2) aligning the prediction to the ground truth before computing metrics. This is what AbsRel/RMSE/delta-1 benchmarks in the literature do, and what this PR adds as user-selectable options. ### Options Two `columns_mapping` keys, both overridable via `--column`, and both visible in `winml eval --schema --task depth-estimation`. `align` controls how each prediction is rescaled against the ground truth depth map before metrics are computed: - `affine` (default): per-image least-squares fit of `pred_aligned = s * pred + t`, where `s` is a scalar scale and `t` is a scalar shift, solved on the valid pixels (those passing the depth range mask). Suitable for relative-depth and disparity models. - `median`: scale-only alignment, `pred_aligned = (median(gt) / median(pred)) * pred`. No shift. Cheaper but less accurate when the model has a non-zero offset. - `none`: use the prediction as-is. Suitable for metric-depth models that already output meters. `depth_kind` indicates what the model outputs: - `depth` (default): prediction is interpreted as depth. - `disparity`: prediction is interpreted as inverse depth, so it is inverted (`pred := 1 / pred`) before alignment. Needed for DPT/MiDaS-style outputs. The depth range used for the valid-pixel mask is also overridable: `min_depth` (default 1e-3, NYU convention) and `max_depth` (default 10.0 meters, NYU convention). Only pixels with `min_depth <= gt <= max_depth` contribute to the metrics. ### Default dataset and testset Default dataset is `sayakpaul/nyu_depth_v2`. All 11 depth-estimation entries from `models_all.json` are added to `models_with_acc.json`, with per-model overrides only where the defaults don't match the model family: - `Intel/zoedepth-nyu-kitti` and `apple/DepthPro-hf` set `align=none` (metric-depth). - `Intel/dpt-hybrid-midas` and `Intel/dpt-large` set `depth_kind=disparity`. - The remaining 7 entries (Depth-Anything family, Marigold, etc.) rely on the defaults (`align=affine`, `depth_kind=depth`). ### Tests Unit tests cover the new evaluator and the metric, including the affine-fit path and the disparity inversion path. The slow/network integration test runs the full pipeline end-to-end on Depth-Anything V2, ZoeDepth, and DPT.

…xample

- JSON key 'avg' -> 'mean' (matches actual output) - Add missing JSON fields: task, precision, timestamp, std, warmup_mean, batches_per_sec - Fix terminal label 'Precision' -> 'Model Precision' - Add missing 'Task:' line in terminal example - Remove false claim about --module using ONNX hierarchy tags (it uses torchinfo to discover PyTorch submodules, not ONNX metadata) - Remove 'per-operator timings' from intro (op-tracing not ready)

- Add model_info block to JSON example (always emitted) - Soften --monitor 'no effect' to acknowledge small system overhead - Change 'not executing' to 'strong signal to investigate' - Add 'monitor' field to NPU JSON example - Fix 'on-chip memory' -> 'dedicated adapter memory' - Note that JSON always includes device_memory even for CPU (zeroed)

Fix docs for eval, compile and quantize

- compile: remove invalid 'cuda' and 'tensorrt' from --ep list, add correct aliases - quantize: --weight-type/--activation-type default is resolved (not hardcoded uint8)

- sys.md: fix EP mapping (QNN -> NPU/GPU, not just NPU) - CONTRIBUTING.md: remove Linux/macOS unzip comment (Windows-only project) - docs/contributing.md: sync with CONTRIBUTING.md - .pre-commit-config.yaml: remove unnecessary --unsafe arg from check-yaml - .gitignore: add comment explaining docs/versions.json

The docs workflow was failing because uv sync --extra dev installs onnxruntime-windowsml which is Windows-only and unavailable on the Ubuntu runner. Add a dedicated 'docs' dependency group with only the packages needed for building docs.

…cation

microsoft-github-policy-service · 2026-06-12T08:00:39Z

@hi-brenda please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

Definitions.
“Code” means the computer software code, whether in human-readable or machine-executable form,
that is delivered by You to Microsoft under this Agreement.
“Project” means any of the projects owned or managed by Microsoft and offered under a license
approved by the Open Source Initiative (www.opensource.org).
“Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
Project, including but not limited to communication on electronic mailing lists, source code control
systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
discussing and improving that Project, but excluding communication that is conspicuously marked or
otherwise designated in writing by You as “Not a Submission.”
“Submission” means the Code and any other copyrightable material Submitted by You, including any
associated comments and documentation.
Your Submission. You must agree to the terms of this Agreement before making a Submission to any
Project. This Agreement covers any and all Submissions that You, now or in the future (except as
described in Section 4 below), Submit to any Project.
Originality of Work. You represent that each of Your Submissions is entirely Your original work.
Should You wish to Submit materials that are not Your original work, You may Submit them separately
to the Project if You (a) retain all copyright and license information that was in the materials as You
received them, (b) in the description accompanying Your Submission, include the phrase “Submission
containing materials of a third party:” followed by the names of the third party and any licenses or other
restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
guidelines concerning Submissions.
Your Employer. References to “employer” in this Agreement include Your employer or anyone else
for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
Submission is made in the course of Your work for an employer or Your employer has intellectual
property rights in Your Submission by contract or applicable law, You must secure permission from Your
employer to make the Submission before signing this Agreement. In that case, the term “You” in this
Agreement will refer to You and the employer collectively. If You change employers in the future and
desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
and secure permission from the new employer before Submitting those Submissions.
Licenses.

Copyright License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
parties.
Patent License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
Your patent claims that are necessarily infringed by the Submission or the combination of the
Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
import or otherwise dispose of the Submission alone or with the Project.
Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
granted by implication, exhaustion, estoppel or otherwise.

Representations and Warranties. You represent that You are legally entitled to grant the above
licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
have disclosed under Section 3). You represent that You have secured permission from Your employer to
make the Submission in cases where Your Submission is made in the course of Your work for Your
employer or Your employer has intellectual property rights in Your Submission by contract or applicable
law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
You later become aware that would make Your representations in this Agreement inaccurate in any
respect.
Information about Submissions. You agree that contributions to Projects and information about
contributions may be maintained indefinitely and disclosed publicly, including Your name and other
information that You submit with Your Submission.
Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
defenses of lack of personal jurisdiction and forum non-conveniens.
Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
supersedes any and all prior agreements, understandings or communications, written or oral, between
the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

tezheng and others added 30 commits May 27, 2026 00:08

fix integration test(only unit test _skip_winml_ep_init) (#760)

a30d5a9

only unit test _skip_winml_ep_init

example: add readme and example.py for microsoft/table-transformer-de…

5bdb1fb

…tection (#779) Also update scripts/e2e_eval/run_pytorch_baseline.py to include pytorch model latency --------- Co-authored-by: hualxie <hualxie@microsoft.com>

chore: enable checking types and fix analyze folder (#768)

9ec0345

Co-authored-by: hualxie <hualxie@microsoft.com>

examples: add 12 builtin model recipes (fp16 + w8a8 + w8a16, 36 confi…

80bc7e0

…gs) (#785) Adds curated recipe configs for the 12 builtin models — those that pass fp16 eval on all 9 (EP, device) buckets.

Validate model task in config. (#723)

7152b82

Fix integration tests. (#773)

2d967c1

DingmaomaoBJTU and others added 26 commits June 10, 2026 19:12

docs: add UI Quickstart to nav

e40ad9c

docs: remove ConvNeXt primitives page and fix all references

6c8f2c3

docs: rename ConvNeXt tutorial, remove site logo icon

570f9b4

docs: expand What you learned section in BERT sample

bcdd421

docs: add repo access link to index and tutorials pages

621ff23

docs: rename site to Windows ML CLI, hide logo icon

e4ecb6c

docs: expand hierarchy tagging section in load-and-export

5484ab6

docs: add concrete tag examples, mermaid diagram, and real export data

dfad05f

docs: fix inaccuracies in load-and-export tagging section

6d4f4d4

docs: move Load and export before Primitives in nav

191a8e9

docs: remove unimplemented optimizer scoping claim

cbe63fc

docs: enrich perf-and-monitoring with real output, flag table, JSON e…

facf95e

…xample

docs: fix perf output table rendering

94b6ac0

docs: remove per-operator tracing section (not ready)

29c8da6

Merge remote-tracking branch 'origin/main' into docs/draft

49c8cde

docs: add memory measurement details to perf monitoring section

6090cf8

docs(perf): separate live monitoring and memory metrics sections

e692180

docs(perf): fix hw_monitor JSON to match actual output

ddbdf04

docs(perf): add per-device metrics breakdown (CPU/GPU/NPU)

e43d8c3

Fix docs for eval, compile and quantize (#874)

559cd77

Fix docs for eval, compile and quantize

docs: fix --ep choices in compile.md, clarify quantize defaults

0426ec8

- compile: remove invalid 'cuda' and 'tensorrt' from --ep list, add correct aliases - quantize: --weight-type/--activation-type default is resolved (not hardcoded uint8)

docs: simplify quickstart by removing redundant notes and admonitions

344df16

hi-brenda requested a review from a team as a code owner June 12, 2026 07:53

docs: merge main into docs/draft, keep docs/draft quickstart simplifi…

c7879ad

…cation

hi-brenda closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add user-facing documentation site#884

docs: add user-facing documentation site#884
hi-brenda wants to merge 150 commits into
mainfrom
docs/draft

hi-brenda commented Jun 12, 2026

Uh oh!

microsoft-github-policy-service Bot commented Jun 12, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

hi-brenda commented Jun 12, 2026

Summary

Uh oh!

microsoft-github-policy-service Bot commented Jun 12, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants