Skip to content

docs: add user-facing documentation site#884

Closed
hi-brenda wants to merge 150 commits into
mainfrom
docs/draft
Closed

docs: add user-facing documentation site#884
hi-brenda wants to merge 150 commits into
mainfrom
docs/draft

Conversation

@hi-brenda

Copy link
Copy Markdown
Contributor

Summary

  • Adds a full MkDocs Material documentation site covering all commands, concepts, tutorials, and reference pages
  • Fixes factual inaccuracies found during source review (flags, defaults, EP behavior, pipeline steps)
  • Simplifies quickstart by removing redundant admonitions and notes

tezheng and others added 30 commits May 27, 2026 00:08
Adds a complete MkDocs Material documentation site for the winml-cli
project, served from /docs and built locally and via GitHub Actions
(manual dispatch).

Site infrastructure:
- mkdocs.yml with Material theme, mermaid superfences, tabbed code,
  light/dark palette toggle
- pyproject.toml dev deps: mkdocs-material, mkdocs-jupyter,
  pymdown-extensions
- .github/workflows/docs.yml (workflow_dispatch only)
- .gitignore exception for docs/superpowers/specs/

User-facing chapters:

Home — tagline + Goals/Promises bullets sourced from the MVP
transcript; describes the toolkit's three workflows (primitives,
pipeline, one-command) plus the EP × Device coverage promise

Getting Started (3 pages):
- Installation — Win 11 24H2 + Copilot+PC + Python 3.10 + uv + git
  prereqs table; 'No NPU?' callout pointing at --device auto with
  the winml eval caveat
- Quickstart — 5-minute export + inspect with
  'winml sys --list-device --list-ep' verify step
- End-to-End Tour — universal --device auto walkthrough that works
  on Copilot+ PC NPU, DirectML GPU, or CPU; tabbed example outputs
  for sys and perf so each reader sees their own machine

Concepts (12 pages in two sub-groups):
- Fundamentals (5): How winml-cli works, Graph and IR, Weight and
  Activation, EP and Device (with the full 7-EP × Device matrix),
  Datatype and Quantization (8-precision family from _KNOWN_PRECISIONS
  with w4a16 marked 'Planned — not yet supported')
- WinML CLI (7 workflow-concept pages): Primitives and pipeline,
  Load and export, Analyze and optimize, Compile and EPContext,
  Perf and monitoring, Eval and datasets, Config and build (with
  the full WinMLBuildConfig schema inline)

Commands (13 pages):
- Overview with the four user-intent groups (Discover / Configure /
  Build / Measure)
- Per-command reference for: sys, inspect, hub, analyze, config,
  optimize, export, quantize, compile, build, perf, eval

Samples (3 pages):
- ConvNeXt — Primitives Walkthrough (CPU/GPU/NPU device comparison)
- BERT — Config + Build + Perf (workflow demonstration)
- Qwen3 — Composite Models (placeholder for the in-progress feature)

Tutorials (2 pages):
- Overview
- ConvNeXt on NPU — 2200-word linear walkthrough with both QNN and
  OpenVINO compile paths shown via tabbed blocks, plus the
  'winml build' one-shot variant

P2 stubs preserved in nav: Reference, Troubleshooting, Contributing

Source-grounding:
- Every flag mentioned in user-facing docs is verified against
  src/winml/modelkit/
- Non-functional flags (--torch-module, --dynamo on export;
  --no-quant on compile) are explicitly marked
- All URLs target the canonical microsoft/winml-cli destination
- mkdocs build --strict passes with zero warnings

Internal artifacts kept under docs/superpowers/ for reference:
- Spec and plan files for the v1 and v2 design iterations
- 2026-05-26-v3-known-issues.md — fact-checked review findings

Existing internal docs (docs/design/, docs/naming-convention.md,
docs/pytest-best-practices.md) are unchanged and excluded from the
user-facing nav via exclude_docs in mkdocs.yml.
…he site

Adds a contributor-facing README at docs/README.md covering:
- uv-based dev setup
- mkdocs serve / build --strict workflow
- gh-deploy publish (local one-shot)
- .github/workflows/docs.yml CI workflow (currently workflow_dispatch only)
- Authoring conventions (winml-cli name, flag verification, admonitions,
  tabbed code blocks)
- Excluded paths reference

Updates mkdocs.yml exclude_docs to include /README.md so the new file
doesn't collide with docs/index.md as the chapter index.
…source

Six parallel review agents fact-checked all 34 user-facing doc files
against microsoft/winml-cli @ 5e25579. Output: one issue file per
source doc at docs/superpowers/2026-05-27-doc-issues/.

A validator agent then cross-checked every Critical and Important
claim and produced the consolidated, false-positive-filtered list at
docs/superpowers/2026-05-27-validated-issues.md.

Summary: 25 Critical + 22 Important kept; 6 rejected as false
positives. Major theme: docs were authored against feat/mvp source
where some symbols and defaults differ from main (e.g., _KNOWN_PRECISIONS
in _options.py vs _NAMED_PRECISIONS in precision.py; winml hub vs
winml catalog; many flag defaults flipped to 'auto'; DML/CPU no
longer produce _ctx.onnx artifacts).

Next step: per-file fix agents will apply the validated list.
…eview

5 parallel fix agents applied the validated-issues list. Net: 25 Critical
+ 22 Important defects resolved across 20 doc files + mkdocs.yml.

Major fixes by area:

Concepts (4 pages):
- quantization.md: NPU auto-precision corrected to w8a16 (was int8);
  w4a16 description corrected (rejected at validation, not 'recognized
  but raises at quantization'); _KNOWN_PRECISIONS/_options.py references
  replaced with the actual _NAMED_PRECISIONS/precision.py
- compile-and-epcontext.md: removed non-existent --no-quant flag mention
- config-and-build.md: JSON 'compile' section flattened to use
  execution_provider (not nested ep_config.provider); table expanded to
  the actual 7 sub-configs (added eval, auto)
- perf-and-monitoring.md: --device documented as accepting auto;
  output path corrected to ~/.cache/winml/perf/<slug>/<timestamp>.json;
  --monitor not NPU-specific; --op-tracing marked hidden

Commands (11 pages):
- overview.md: winml hub renamed to winml catalog throughout;
  _options.py reference replaced with cli.py
- hub.md: H1 and all invocations changed to 'winml catalog'; removed
  non-existent --model/-m flag; rewrote 'How it works' (no per-EP latency
  / accuracy-verdict columns exist); added --ep/--device filter flags
- build.md: --config marked optional (was required); --random-init and
  --qnn-sdk-root removed (don't exist); --no-compile/--compile toggle
  pair documented; --trust-remote-code added; --max-optim-iterations
  default corrected to None
- compile.md: --device default corrected to auto; --no-quant flag
  removed (doesn't exist on compile)
- config.md: --no-compile/--compile framing corrected (compile is
  EXCLUDED by default; users need --compile to include)
- eval.md: --device includes auto (default auto, not cpu); -n short
  alias removed; class reference replaced with actual evaluate function
- analyze.md: --device default corrected to auto; --ep default to
  auto; --run-unknown-op default to False; -m/-v/-q/-c flags added
- optimize.md: --preset/-p flag and entire Built-in presets table
  removed (flag doesn't exist); --verbose added; 'Configuration
  precedence' reduced from 4 levels to 3
- inspect.md: --list-tasks, --model-type, --model-class, --verbose
  flags added
- perf.md: --compare-devices removed (not registered at all); output
  path corrected; --op-tracing marked hidden
- sys.md: --verbose/-v added to flag table

Samples / Tutorials / Getting Started (5 pages):
- installation.md: Python 3.10 corrected to 3.11; 'No NPU?' callout
  no longer claims winml eval rejects auto (it accepts auto on main)
- end-to-end.md: dropped incorrect _ctx.onnx CPU/DML artifacts;
  QNNExecutionProvider mapped to NPU/GPU (not just NPU)
- convnext-primitives.md: CPU/GPU compile clarified (no _ctx.onnx
  produced; uses convnext_int8.onnx directly); winml eval auto reverted
- bert-config-build.md: build final artifact corrected to model.onnx
  (was bert-base-uncased_ctx.onnx)
- npu-convnext.md: Python 3.10 -> 3.11; OpenVINO artifact filename
  corrected to use device string (_npu_ctx.onnx not _openvino_ctx.onnx);
  CPU compile tab dropped (CPU doesn't produce _ctx.onnx)

mkdocs.yml: nav label 'hub' renamed to 'catalog' to match the actual
command name on microsoft/winml-cli main.
…meration)

The opening paragraph re-stated the project tagline (already on the
home page one click above) and enumerated 4 EPs (QNN, OpenVINO, DML,
ONNX Runtime) — which goes stale; the canonical list in
concepts/eps-and-devices.md has 7. Removing the paragraph; the page
now starts with the Prereqs table. Matches the convention used by
quickstart.md and end-to-end.md (neither re-states the tagline).
## Summary

- Rewrote `docs/concepts/analyze-and-optimize.md` with source-verified
content: SupportLevel classification table, lint vs autoconf outputs,
analysis modes, optimizer pipe architecture (4 pipes, 43 capabilities, 5
rewrite groups / 12 rules), and autoconf loop SVG diagram
- Updated `docs/commands/analyze.md` with corrected EP aliases,
exit-code table, and additional CLI examples
- Renamed `hub.md` → `catalog.md` and updated all cross-references
(inspect, overview, sys, mkdocs.yml)
- Fixed `check-yaml` pre-commit hook to support `!!python/name` tags in
mkdocs.yml (`--unsafe`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Zhipeng Wang <zhiwang@microsoft.com>
Co-authored-by: Qiong Wu (qiowu) <qiowu@microsoft.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Charles Zhang <zhangchao@microsoft.com>
Co-authored-by: Zhenchao Ni <zhenni@microsoft.com>
## Summary

- Drop the `WindowsAppRuntimeVersion` class, attribute, property, and
`windowsAppRuntimeVersion` field in `SysInfo.to_dict()` from
`src/winml/modelkit/sysinfo/sysinfo.py`.
- Remove the now-unused `import re`.

Nothing else in the codebase referenced these symbols. Integration
`runtime_checker` fixtures still contain the field inside their stored
`sys_info` blob, but the test helper ignores `sys_info` during
comparison, and the field will disappear naturally next time those
fixtures are regenerated.
…763)

## Summary
- **VitisAI EP ordering**: Move `VitisAIExecutionProvider` to end of
`EP_SUPPORTED_DEVICES` so it appears last in `analyze --ep all` output,
since it is not yet fully supported.
- **Catalog table width**: Set `expand=False` on both `Table` and
`Panel` in `_build_list_renderable` so the catalog table fits its
content width instead of stretching to the full terminal width.
…tection (#779)

Also update scripts/e2e_eval/run_pytorch_baseline.py to include pytorch
model latency

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
## Summary

- Reorganized README into 5 sections: Title + Description, Features /
Scope, Getting Started, Commands, Contributing + License
- Updated status badge to `preview`, rewrote description and Features (✅
bullets)
- Scope section: added supported EPs, built-in model catalog reference,
accepted inputs; removed verbose LLM/not-supported block
- Getting Started: consolidated Prerequisites + Installation + Quick
Start; added Config-Build Pipeline and Step-by-step through primitive
commands walkthroughs
- Commands: BYOM workflow with pipeline diagram, command table +
collapsible details, comparison table (Config-Driven first)
- Reference tables at end: Supported Hardware, Supported Tasks,
Supported Model Types, Built-in Models

---------

Co-authored-by: Qiong Wu (qiowu) <qiowu@microsoft.com>
Co-authored-by: Zhipeng Wang <zhiwang@microsoft.com>
## Summary

- Removed the duplicated `WinML CLI (Python wheel) | [Releases]` row in
the Prerequisites table.
- Updated the install step from `uv pip install
winml_cli-<version>-py3-none-any.whl` to `pip install winml-cli`.
- Updated the Prerequisites entry to point at PyPI instead of GitHub
Releases, keeping the table and install instructions consistent.
## Summary

- Adds `resolve_check_device_ep` helper that validates a (device, EP)
combination without requiring the device/EP to actually exist on the
system. Closes #765.
- `commands/config.py` and `config/build.py` now use
`resolve_check_device_ep` instead of `resolve_device` so `winml config`
no longer hard-fails on hosts where the requested EP isn't installed.
- When `device=auto` or `ep=None`, the helper delegates to the existing
`resolve_device` + `resolve_eps` flow (system-aware behavior preserved).
When both `device` and `ep` are explicit, it only validates against the
static `EP_SUPPORTED_DEVICES` mapping.
- CLI cleanup: `-m/--model`, `-c/--config`, `--device` for the config
command now use the shared `cli_utils.*_option` decorators.

## Tests

- New `TestResolveCheckDeviceEp` class in
`tests/unit/sysinfo/test_device.py` covering both code paths (delegation
and static-only) plus error cases (unknown EP, unsupported device,
case-insensitivity).
- Existing config-test mocks updated from `resolve_device` to
`resolve_check_device_ep` (`tests/unit/config/conftest.py`,
`tests/unit/config/test_build.py`,
`tests/unit/config/test_build_onnx.py`,
`tests/unit/commands/test_config_cli.py`) so the lazy import in
`config/build.py` is intercepted.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
…gs) (#785)

Adds curated recipe configs for the 12 builtin models — those that pass
fp16 eval on all 9 (EP, device) buckets.
## Summary

Fixes `scripts/e2e_eval/run_eval.py` crashing on VitisAI EP (AMD Ryzen
AI NPU) and a latent bug in `winml build` that prevented the script's
`--no-quant` workaround from actually taking effect.

The crash: VitisAI ships its own internal quantizer and runs it at
session-create time. Layering winml's generic QDQ quantization pass on
top produces a model VitisAI cannot consume, which manifests as
`DpuKernelRunner.cpp:1920 DPU timeout` during `winml perf`. The fix is
to tell winml to skip its own quantization when the selected EP
quantizes natively.

## Changes

### `src/winml/modelkit/commands/build.py` — root-cause fix (1 line)

When `--device` was passed to `winml build`, the internal
`_patch_device` helper unconditionally re-populated `cfg.quant` with the
device's default quantization config, silently undoing any prior
`--no-quant`. The condition now respects `no_quant`:

```python
if no_quant or resolved_quant is None:
    cfg.quant = None
```

Without this, `winml build … --device npu --no-quant` still produced a
`_quantized.onnx` artifact.

### `scripts/e2e_eval/run_eval.py` — script wiring

- New canonical-name set `_NATIVE_QUANT_EPS =
{"VitisAIExecutionProvider"}` plus a helper `_ep_quantizes_natively(ep)`
that funnels both canonical names and user aliases (e.g. `vitisai`)
through `winml.modelkit.utils.constants.normalize_ep_name`. No hardcoded
aliases.
- `_resolve_precision(...)` gained an `ep` parameter; for native-quant
EPs it returns `None` so no precision flag is sent.
- `_run_build` now passes `--no-quant` to **both** `winml config` (so
the persisted `build_config.json` has `quant: null` up-front) and `winml
build` (defense in depth) when the EP quantizes natively.
- Call sites in `run_model` and `main` updated to thread `ep` through
`_resolve_precision`.

## Why the earlier commits in this branch weren't enough

The first attempt (`fix(run_eval): skip quantize when VitisAI EP is
selected`) wired `--no-quant` only into `winml build`. That didn't take
effect because of the `_patch_device` bug above. The second attempt
(`fix(vitisai): resolve auto-precision to w8a8 for VitisAI NPU`) tried
to switch precision instead of skipping — also wrong, since VitisAI
wants an fp32 input and quantizes it itself. The final state keeps the
script clean (`--no-quant`, no precision override) and fixes the actual
`winml build` bug.

## Verification

Manual end-to-end on AMD Ryzen AI (VitisAI NPU), with a clean
`~/.cache/winml/artifacts/...` and output dir:

```pwsh
uv run --no-sync python scripts/e2e_eval/run_eval.py `
  --hf-model facebook/convnext-tiny-224 `
  --task image-classification `
  --device npu --ep vitisai `
  --eval-type perf --no-report --verbose --timeout 1800 `
  --output-dir e2e-test\vitisai_npu
```

Before: `winml perf` crashed with `DpuKernelRunner.cpp:1920 DPU
timeout`.
After:
- Cached `imgcls_*_winml_build_config.json` has `"quant": null`.
- No `_quantized.onnx` artifact produced.
- Perf step: **PASS** in ~120 s.
…771)

## Summary

Closes #546.

`winml inspect --task bogus-task` was leaking optimum's internal
`TasksManager` class name and pointing users to optimum docs:

> Error: Inspection error: Task 'bogus-task' not supported by
TasksManager. Check optimum documentation for supported tasks.

Now the value is validated at Click parse time against the hand-coded
`KNOWN_TASKS` set, before any heavy imports:

```
$ winml inspect -m microsoft/resnet-50 --task bogus-task
Usage: winml inspect [OPTIONS]
Try 'winml inspect --help' for help.

Error: Invalid task 'bogus-task'. Valid: audio-classification, audio-frame-classification, audio-xvector, automatic-speech-recognition, depth-estimation, ... (35 total). See 'winml inspect --list-tasks' for the full list.
```

- Exit code 2 (Click UsageError)
- No third-party class names; no optimum-docs pointer
- Callback imports only `..loader.task.KNOWN_TASKS` — avoids the ~10s
optimum/transformers cold start, so the fail-fast stays fast
- `--list-tasks` and valid `--task` paths unchanged

Co-authored-by: Ziyuan Guo (WE TEAM) <ziyuanguo@microsoft.com>
…#772)

## Summary

Fixes #541.

`winml catalog` was the only command where `-t` did NOT mean `--task`:

| Command   | `-t` means       |
|-----------|------------------|
| `inspect` | `--task`         |
| `export`  | `--task`         |
| `config`  | `--task`         |
| `catalog` | `--model-type` (inconsistent) |

A user who has memorized `-t` to mean `--task` in 3 commands would type
`-t image-classification` against `winml catalog` and silently get
`--model-type=image-classification` (no such model type) instead.

## Change

In `src/winml/modelkit/commands/catalog.py`:
- Dropped the `-t` short from `--model-type` (no short alias now).
- Moved `-t` to `--task` (replacing the previous `-k`).

`--model-type` is still fully supported via its long form.

Adds a regression guard test (`test_model_type_has_no_short_flag`) that
checks both the `--help` output AND that passing a model_type via `-t`
is interpreted as a task. All 115 catalog tests pass.

Co-authored-by: Ziyuan Guo (WE TEAM) <ziyuanguo@microsoft.com>
**Skips compilation related cases**

There are some model fail to be compiled in VitisAI Execution Provider.
The error is an "Access Violation" error which causes the python process
to crash. This would be an EP side problem. To unblock our e2e test, I
have skipped them for VitisAI

**Skips npu usage assertion for small model**

Running small mock model can be super fast. For this case, the NPU usage
is zero. However, our assertion logic still expectes to have some NPU
usage. This makes the e2e not stable. Considering that we have already
this assertion on real model e2e test cases, I skip this assertion for
small model only.

**Skips eval metric value range assertion**

The eval e2e test only uses 10 samples because we aim to see the eval
pipeline is working rather than truly eval a model in e2e. In assertion
logic, we have a metric range. But the metric range is calcuated on qnn
device, which may not be the same for other devices. Using the same
range may cause e2e instable. Therefore, I only assert the metric range
for qnn. For other device, I just assert the metric value is available.
uv run
~\ModelKit\examples\microsoft-swin-large-patch4-window7-224\example.py
--onnx
~\.cache\winml\artifacts\microsoft_swin-large-patch4-window7-224\imgcls_ec485f4653d962b9_quantized.onnx
True label: house finch, linnet, Carpodacus mexicanus (synset=n01532829,
id=12)

Top 5 predictions:
  1. house finch, linnet, Carpodacus mexicanus (0.9127)
  2. brambling, Fringilla montifringilla (0.0122)
  3. goldfinch, Carduelis carduelis (0.0028)
  4. chickadee (0.0013)
  5. junco, snowbird (0.0013)

Verdict (top-1): PASS

Annotated image written to prediction.png

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
…ng (#790)

## Summary

timm checkpoints load through transformers'' generic `TimmWrapper`
(`model_type="timm_wrapper"`) and previously failed in **every** `winml`
command with *"Cannot detect task: config has no ''architectures''
field"*. Two gaps:

1. **Task/class detection** — timm repos load as `TimmWrapperConfig`
with `architectures=None`, so auto-detection could not resolve a task or
class.
2. **OnnxConfig location** — Optimum registers timm''s config
(`TimmDefaultOnnxConfig`) only under `library_name="timm"`, but every
`winml` lookup defaults to `transformers`.

`timm_wrapper` is transformers'' generic bridge for the whole timm
library — not a model architecture — so it is resolved at the **shared
resolution layer**, not as a per-model config. Only the library is
recorded; the task is derived from Optimum.

## Changes (no `models/hf/` entry)

- **`loader/task.py`** — `WRAPPED_LIBRARY_MODEL_TYPES` (`model_type ->
optimum_library`) + `resolve_optimum_library()`. When a config has no
`architectures`, `_detect_task_and_class_from_config` derives the task
from Optimum''s task list for the library
(`get_supported_tasks("timm_wrapper", "timm")` ->
`["image-classification"]`) and the class from
`get_model_class_for_task` (generic `AutoModelForImageClassification`,
which transformers dispatches to `TimmWrapper` at load). The task is not
hardcoded; the branch imports `optimum.exporters.onnx.model_configs`
first to populate Optimum''s registry (scoped so normal model loading
never pays for it).
- **`export/io.py`** — `_get_onnx_config` routes the library via
`resolve_optimum_library`, so `timm_wrapper` resolves Optimum''s
`TimmDefaultOnnxConfig` from every call site
(config/build/export/inspect) with no `--library` flag.
- **`commands/inspect.py`** + **`inspect/resolver.py`** — route both the
CLI inspect path and the public `inspect_model` path the same way:
library routing for the OnnxConfig lookup, plus wrapped-library task
detection so the task is not mislabeled.
- Tests: `resolve_optimum_library` + wrapped-library architectures
fallback with task derivation (loader); timm library routing for
`resolve_io_specs` / `_get_onnx_config` (export); public inspect path
`detect_task` / `resolve_exporter` for timm (inspect).

## Validation

**Functional (end-to-end)** on a timm image-classification model:

| Command | Before | After |
|---|---|---|
| `winml config` | exit 2 — *no ''architectures'' field* |
task=image-classification, 1 input |
| `winml export` | exit 2 — same | `model.onnx` (pixel_values to logits)
|
| `winml inspect` | exit 1 — same | `AutoModelForImageClassification` +
`TimmDefaultOnnxConfig`, full I/O table |

`config` -> `export` -> `optimize` -> `model.onnx` validated end-to-end
for multiple timm CNN classifiers. Also resolves on a timm ViT backbone
(`num_labels=0`) -> task=image-classification, matching Optimum''s own
`infer_task_from_model`, so it generalizes across timm architectures
(CNN + ViT).

**No impact on existing models** — scanned all 439 entries / 401 unique
models in `scripts/e2e_eval/testsets/models_all.json`: **0** are
`timm_wrapper` (by JSON metadata and by loaded config; 330 loadable).
Since `timm_wrapper` is the only trigger of the new branch, no existing
model changes behavior. (71 fail to load a config — custom/GGUF/tabular
types that fail at `AutoConfig` regardless; 7 have empty `architectures`
but are not timm — a pre-existing "Cannot detect task", identical before
and after the PR.)

**No overhead for normal (non-timm) models** — `winml config` on a
standard non-timm model: this branch vs base, min ~12.6s vs ~12.5s
(within run-to-run noise). Non-timm configs have `architectures`, so
they skip the new branch; the only added cost is one dict lookup.

**Unit tests** — `tests/unit/loader` + `tests/unit/export` +
`tests/unit/inspect`: green.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Yi Ren <reny@microsoft.com>
## Fix model-task inconsistency for vision feature-extraction models

Fixes #777, #778, #782.

### Principle

`winml inspect` is the source of truth for valid `(model_id, task)`
pairs. Both `feature-extraction` and `image-feature-extraction` are
valid ways to address an image-embedding model like
`facebook/dinov2-base`. Downstream commands must accept whichever name
`winml inspect` accepts, then use `(model_id, task)` to locate the
concrete class to act on.

### Root cause

Optimum's `TasksManager.get_exporter_config_constructor` only knows
canonical Optimum task names. Several call sites passed the raw
user-supplied task straight through, so HF aliases like
`image-feature-extraction` were rejected with "Unsupported". The
evaluator additionally needs to know which HF pipeline name to dispatch
on, which the canonical Optimum task name doesn't carry by itself for
bimodal tasks like `feature-extraction`.

### Fix

- **Inspect / export / HTP exporter**: normalize via
`_map_task_synonym(task)` (in `export/io.py`) before any `TasksManager`
lookup because it requires normalized task input. This is a single
function reused at each `TasksManager` boundary — no new global table.
- **Quantize**: `_resolve_dataset_class(task, io_config)` in
`datasets/__init__.py` dispatches to `TextDataset` / `ImageDataset`
based on the actual ONNX input names. No `AutoConfig.from_pretrained`
round-trip. Bimodal io_configs fall back to `RandomDataset` with a
warning.
- **Evaluate**: Because HF pipeline and evaluate library have their task
name convention, `to_hf_pipeline_task(task, model_id)` in
`eval/evaluate.py` translates to the HF pipeline name the underlying
`evaluate` library expects. Uses `OnnxConfig.inputs` (no weights loaded)
to pick the modality. Bimodal models (e.g. CLIP combined: both
`pixel_values` and `input_ids`) keep the task unchanged via a `len(hits)
== 1` guard, preserving the explicit user task.

### Validation

`facebook/dinov2-base`:

| Command | Before | After |
|---|---|---|
| `winml inspect -m facebook/dinov2-base --task
image-feature-extraction` | "Unsupported" | Resolves via
`Dinov2OnnxConfig` |
| `winml export -m facebook/dinov2-base -t image-feature-extraction` |
KeyError on TasksManager | Valid ONNX with `last_hidden_state` |
| `winml eval -m facebook/dinov2-base --task feature-extraction` |
`RuntimeError: Failed to create feature-extraction dataset` | kNN
metrics on mini-imagenet |
| `winml quantize <onnx> --task feature-extraction -m
facebook/dinov2-small` | Failure by using TextDataset | Routes to
`ImageDataset` |

`openai/clip-vit-base-patch32` (bimodal, regression check):

- `winml eval -m openai/clip-vit-base-patch32 --task feature-extraction`
→ stays `feature-extraction` (text STS evaluator); not silently rerouted
to image.
- `winml eval -m openai/clip-vit-base-patch32` (auto-detect) → resolves
to `feature-extraction` (text).

### Tests

Unit:
- `tests/unit/eval/test_eval.py::TestResolveTask` — auto-detect,
explicit task, bimodal guard, HF pipeline translation.
- test_random_dataset.py — `TASK_DATASET_MAPPING` covers all registered
tasks, including bimodal dict-of-dict.

E2E (`-m e2e`, dinov2 chosen because it isn't in `MODEL_BUILD_CONFIGS`
and so actually exercises the `TasksManager` path):
- `tests/e2e/test_inspect_e2e.py::TestInspectDinoV2` — both
`image-feature-extraction` and `feature-extraction` resolve.
-
`tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction`.
-
`tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction`
parameterized over both task names.
-
`tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset`.
Table for stub

```
  ┌──────────────┬──────────┬────────────────────────────────┬────────────────────────────────────────────┐
  │     Lib      │ py.typed │            Reality             │              Override status               │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ torch        │ yes      │ Has inline types (v2.11)       │ Override is a no-op — mypy uses real types │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ torchvision  │ no       │ No types, no community stubs   │ Genuinely needed                           │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ onnx         │ yes      │ Has inline types (v1.18)       │ Override is a no-op                        │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ onnxruntime  │ no       │ Untyped; no community stubs    │ Genuinely needed                           │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ transformers │ yes      │ Inline types but partial/loose │ Override is a no-op — types ARE used       │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ datasets     │ no       │ Untyped                        │ Genuinely needed                           │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ optimum      │ no       │ Untyped                        │ Genuinely needed                           │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ timm         │ yes      │ Has inline types (v1.0.26)     │ Override is a no-op                        │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ onnxscript   │ yes      │ Has inline types (v0.7)        │ Override is a no-op                        │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ snakemd      │ no       │ Untyped                        │ Genuinely needed                           │
  ├──────────────┼──────────┼────────────────────────────────┼────────────────────────────────────────────┤
  │ openvino     │ n/a      │ Not installed locally          │ n/a                                        │
  └──────────────┴──────────┴────────────────────────────────┴────────────────────────────────────────────┘
```

plotext added to ignore_missing_imports (no community stubs, untyped
library)

---------

Co-authored-by: Hualiang Xie <hualxie@microsoft.com>
…m_task (#801)

## What

PR1 of #800. Relocate `map_task_synonym` ->
`loader/task.py::to_optimum_task` to establish a single WinML->Optimum
task-collapse boundary.

## Changes

- `loader/task.py`: add `to_optimum_task` + `TASK_SYNONYM_EXTENSIONS`
(moved from `export/io.py`); exported via `loader/__init__.py`.
- `export/io.py`: local implementation removed; `map_task_synonym` kept
as a backward-compatible alias (`= to_optimum_task`); internal use
repointed.
- Optimum-boundary call sites repointed to `to_optimum_task`:
`commands/inspect.py`, `export/htp/exporter.py`, `inspect/resolver.py`.
- `commands/build.py`: `TASK_SYNONYM_EXTENSIONS` now imported from
`loader`.
- New `tests/unit/loader/test_task_boundary.py` pins the collapse
contract.

## Behavior

No behavior change. `map_task_synonym` stays importable from
`export.io`; the collapse semantics (`image-feature-extraction` ->
`feature-extraction`, WinML extensions preserved) are byte-identical.
Existing synonym and #777/#782 regression tests stay green.

Sets up PR2 (#800), which adds the modality-aware `detect_task` and
relies on this single collapse boundary.
…#793)

Fixes #566.

## Problem

- Top-level group declared ``-v/--verbose`` (count) and ``-q/--quiet``,
but 12 of 13 subcommands redeclared ``--verbose`` as ``is_flag=True``,
so ``winml export -vv …`` errored with ``extra argument``.
- No subcommand exposed ``-q/--quiet``, so ``winml export --quiet …``
failed with ``no such option``.
- Each command wired logging differently; DEBUG/INFO lines interleaved
with Rich tables on stdout, breaking ``cmd > out 2> log.txt``.

## Changes

- ``utils/cli.py``: ``verbosity_options`` decorator (``-v`` count,
``-q`` flag) + new ``resolve_verbosity(ctx, verbose, quiet)`` helper
that merges top-level and subcommand-level values (max of verbose, OR of
quiet). Honors the legacy ``ctx.obj[""debug""]`` so tests that bypass
``main()`` still raise the verbosity floor.
- ``utils/logging.py``: format ``[%(asctime)s %(levelname)-7s %(name)s]
%(message)s`` with ``datefmt=%H:%M:%S``, ``stream=sys.stderr``.
Idempotent — re-creates the WinML handler bound to the current
``sys.stderr`` on each call so Click ``CliRunner`` stream redirection
keeps working, and leaves non-WinML handlers (notably pytest ``caplog``)
intact.
- ``cli.py``: top-level group uses ``@verbosity_options`` (replaces
inline declarations); ``--debug`` alias preserved.
- 12 subcommands (``build``, ``compile``, ``config``, ``eval``,
``export``, ``inspect``, ``optimize``, ``perf``, ``quantize``, ``sys``,
plus ``analyze`` cleanup): replace ad-hoc ``--verbose``
(``is_flag=True``) with ``@cli_utils.verbosity_options``, add ``quiet:
bool`` param, call ``configure_logging(verbosity=verbose, quiet=quiet)``
after ``resolve_verbosity``. Removes the legacy ``if
ctx.obj.get(""debug""): verbose = True`` blocks (folded into the
helper).
- ``serve/app.py``: pre-existing latent bug — module-level
``logging.getLogger(""winml.modelkit"").setLevel(INFO)`` ran at import,
which muted DEBUG capture in unrelated tests that got collected
alongside the serve test module. Split into ``_attach_log_handler()``
(idempotent, called from ``_register_routes``) and a paired
``_ensure_log_capture_level`` / ``_restore_log_capture_level`` invoked
from the production lifespan. Tests that build the app via
``_register_routes`` + a mock lifespan no longer leak global logger
state.

## Behavior

Both flag positions work; subcommand value wins when both are passed
(max/OR merge):

```text
winml -v export -m … -o …            # top-level: works
winml export -vv -m … -o …            # subcommand: now works (was: extra argument)
winml --quiet export -m … -o …        # top-level: works
winml export --quiet -m … -o …        # subcommand: now works (was: no such option)
winml inspect -vv -m … --format json > out 2> log.txt   # clean stdout/stderr split
```

## Tests

- ``tests/cli/`` (23): pass
- ``tests/unit/`` (5061 collected): **5058 pass**, 3 fail — all 3
pre-existing on main and unrelated to this change:
-
``test_winml_session.py::TestOpenVINODeviceRouting::test_compile_openvino_cpu_device_succeeds``
-
``test_winml_session.py::TestOpenVINODeviceRouting::test_compile_openvino_cpu_provider_not_npu``
(both env: no OpenVINO EP installed)
-
``test_config_utils.py::TestMergeConfigNoneHandling::test_none_to_value_transition``
(test isolation, passes alone)

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
## Summary
- Replace hardcoded 4-EP list in `analyze_from_proto(ep=None)` with
dynamic lookup from `EP_SUPPORTED_DEVICES`, filtered by target device
- Remove `max_length=4` constraint on `AnalysisOutput.results` to
support more than 4 EPs per device
- Change uniqueness validator from IHV type to EP type (multiple EPs can
share the same IHV, e.g. CUDA and DML both map to MICROSOFT)

**Before:** `analyzer.analyze(ep=None)` always analyzed QNN, OpenVINO,
VitisAI, NvTensorRTRTX regardless of device — NvTensorRTRTX was analyzed
on NPU even though it only supports GPU.

**After:** EP list is derived from `EP_SUPPORTED_DEVICES` filtered by
the target device, matching the CLI `--ep all` behavior exactly.
Resolves #326.

Adds `WinMLDepthEstimationEvaluator` and `DepthMetric` (Absolute
Relative error, RMSE, delta-1) following the NYU/KITTI evaluation
protocol. HuggingFace `evaluate` doesn't ship a depth-estimation
evaluator, so the metric loop is implemented manually.

### Background

Depth-estimation models fall into a few groups, and the same input image
gives wildly different prediction scales depending on which group the
model belongs to.

- Metric-depth models (ZoeDepth, DepthPro) predict depth in meters
directly.
- Relative-depth models (Depth-Anything, Marigold) predict depth up to
an unknown scale and shift.
- Disparity models (DPT, MiDaS) predict `1 / depth` (inverse depth) up
to scale and shift.

Comparing predictions against the NYU ground truth therefore requires
(1) optionally inverting disparity into depth and (2) aligning the
prediction to the ground truth before computing metrics. This is what
AbsRel/RMSE/delta-1 benchmarks in the literature do, and what this PR
adds as user-selectable options.

### Options

Two `columns_mapping` keys, both overridable via `--column`, and both
visible in `winml eval --schema --task depth-estimation`.

`align` controls how each prediction is rescaled against the ground
truth depth map before metrics are computed:

- `affine` (default): per-image least-squares fit of `pred_aligned = s *
pred + t`, where `s` is a scalar scale and `t` is a scalar shift, solved
on the valid pixels (those passing the depth range mask). Suitable for
relative-depth and disparity models.
- `median`: scale-only alignment, `pred_aligned = (median(gt) /
median(pred)) * pred`. No shift. Cheaper but less accurate when the
model has a non-zero offset.
- `none`: use the prediction as-is. Suitable for metric-depth models
that already output meters.

`depth_kind` indicates what the model outputs:

- `depth` (default): prediction is interpreted as depth.
- `disparity`: prediction is interpreted as inverse depth, so it is
inverted (`pred := 1 / pred`) before alignment. Needed for
DPT/MiDaS-style outputs.

The depth range used for the valid-pixel mask is also overridable:
`min_depth` (default 1e-3, NYU convention) and `max_depth` (default 10.0
meters, NYU convention). Only pixels with `min_depth <= gt <= max_depth`
contribute to the metrics.

### Default dataset and testset

Default dataset is `sayakpaul/nyu_depth_v2`. All 11 depth-estimation
entries from `models_all.json` are added to `models_with_acc.json`, with
per-model overrides only where the defaults don't match the model
family:

- `Intel/zoedepth-nyu-kitti` and `apple/DepthPro-hf` set `align=none`
(metric-depth).
- `Intel/dpt-hybrid-midas` and `Intel/dpt-large` set
`depth_kind=disparity`.
- The remaining 7 entries (Depth-Anything family, Marigold, etc.) rely
on the defaults (`align=affine`, `depth_kind=depth`).

### Tests

Unit tests cover the new evaluator and the metric, including the
affine-fit path and the disparity inversion path. The slow/network
integration test runs the full pipeline end-to-end on Depth-Anything V2,
ZoeDepth, and DPT.
DingmaomaoBJTU and others added 26 commits June 10, 2026 19:12
- JSON key 'avg' -> 'mean' (matches actual output)
- Add missing JSON fields: task, precision, timestamp, std, warmup_mean, batches_per_sec
- Fix terminal label 'Precision' -> 'Model Precision'
- Add missing 'Task:' line in terminal example
- Remove false claim about --module using ONNX hierarchy tags
  (it uses torchinfo to discover PyTorch submodules, not ONNX metadata)
- Remove 'per-operator timings' from intro (op-tracing not ready)
- Add model_info block to JSON example (always emitted)
- Soften --monitor 'no effect' to acknowledge small system overhead
- Change 'not executing' to 'strong signal to investigate'
- Add 'monitor' field to NPU JSON example
- Fix 'on-chip memory' -> 'dedicated adapter memory'
- Note that JSON always includes device_memory even for CPU (zeroed)
Fix docs for eval, compile and quantize
- compile: remove invalid 'cuda' and 'tensorrt' from --ep list, add correct aliases
- quantize: --weight-type/--activation-type default is resolved (not hardcoded uint8)
- sys.md: fix EP mapping (QNN -> NPU/GPU, not just NPU)
- CONTRIBUTING.md: remove Linux/macOS unzip comment (Windows-only project)
- docs/contributing.md: sync with CONTRIBUTING.md
- .pre-commit-config.yaml: remove unnecessary --unsafe arg from check-yaml
- .gitignore: add comment explaining docs/versions.json
The docs workflow was failing because uv sync --extra dev installs
onnxruntime-windowsml which is Windows-only and unavailable on the
Ubuntu runner. Add a dedicated 'docs' dependency group with only
the packages needed for building docs.
@hi-brenda hi-brenda requested a review from a team as a code owner June 12, 2026 07:53
@hi-brenda hi-brenda closed this Jun 12, 2026
@microsoft-github-policy-service

Copy link
Copy Markdown

@hi-brenda please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

  1. Definitions.
    “Code” means the computer software code, whether in human-readable or machine-executable form,
    that is delivered by You to Microsoft under this Agreement.
    “Project” means any of the projects owned or managed by Microsoft and offered under a license
    approved by the Open Source Initiative (www.opensource.org).
    “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
    Project, including but not limited to communication on electronic mailing lists, source code control
    systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
    discussing and improving that Project, but excluding communication that is conspicuously marked or
    otherwise designated in writing by You as “Not a Submission.”
    “Submission” means the Code and any other copyrightable material Submitted by You, including any
    associated comments and documentation.
  2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any
    Project. This Agreement covers any and all Submissions that You, now or in the future (except as
    described in Section 4 below), Submit to any Project.
  3. Originality of Work. You represent that each of Your Submissions is entirely Your original work.
    Should You wish to Submit materials that are not Your original work, You may Submit them separately
    to the Project if You (a) retain all copyright and license information that was in the materials as You
    received them, (b) in the description accompanying Your Submission, include the phrase “Submission
    containing materials of a third party:” followed by the names of the third party and any licenses or other
    restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
    guidelines concerning Submissions.
  4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else
    for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
    Submission is made in the course of Your work for an employer or Your employer has intellectual
    property rights in Your Submission by contract or applicable law, You must secure permission from Your
    employer to make the Submission before signing this Agreement. In that case, the term “You” in this
    Agreement will refer to You and the employer collectively. If You change employers in the future and
    desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
    and secure permission from the new employer before Submitting those Submissions.
  5. Licenses.
  • Copyright License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
    Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
    the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
    parties.
  • Patent License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
    Your patent claims that are necessarily infringed by the Submission or the combination of the
    Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
    import or otherwise dispose of the Submission alone or with the Project.
  • Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
    No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
    granted by implication, exhaustion, estoppel or otherwise.
  1. Representations and Warranties. You represent that You are legally entitled to grant the above
    licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
    have disclosed under Section 3). You represent that You have secured permission from Your employer to
    make the Submission in cases where Your Submission is made in the course of Your work for Your
    employer or Your employer has intellectual property rights in Your Submission by contract or applicable
    law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
    have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
    You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
    REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
    EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
    PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
    NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
  2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
    You later become aware that would make Your representations in this Agreement inaccurate in any
    respect.
  3. Information about Submissions. You agree that contributions to Projects and information about
    contributions may be maintained indefinitely and disclosed publicly, including Your name and other
    information that You submit with Your Submission.
  4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
    the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
    Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
    exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
    defenses of lack of personal jurisdiction and forum non-conveniens.
  5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
    supersedes any and all prior agreements, understandings or communications, written or oral, between
    the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.