Skip to content

feat: add --precision fp16 to optimize, build, and export commands#872

Open
DingmaomaoBJTU wants to merge 1 commit into
mainfrom
dingmaomaobjtu/feat-fp16-conversion
Open

feat: add --precision fp16 to optimize, build, and export commands#872
DingmaomaoBJTU wants to merge 1 commit into
mainfrom
dingmaomaobjtu/feat-fp16-conversion

Conversation

@DingmaomaoBJTU

@DingmaomaoBJTU DingmaomaoBJTU commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add --precision fp16 flag to winml optimize, winml build, and winml export commands for FP32→FP16 model conversion.

Fixes #867

Design

FP16 lives as a command-layer utility function (optim/fp16.py), not in the optimizer pipe registry. All three commands share the same convert_to_fp16() entry point:

graph TD
    subgraph "CLI Commands"
        subgraph optimize
            O1[Optimizer.optimize] --> O2[convert_to_fp16]
        end
        subgraph build
            B1[Export] --> B2[Optimize] --> B3[FP16] --> B4[Quantize] --> B5[Compile]
        end
        subgraph export
            E1[torch.onnx.export] --> E2[convert_to_fp16]
        end
    end

    O2 --> FP16
    B3 --> FP16
    E2 --> FP16

    FP16["optim/fp16.py<br/><code>convert_to_fp16(model, keep_io_types, op_block_list)</code>"]

    style FP16 fill:#e1f5fe,stroke:#0288d1
    style B3 fill:#e1f5fe,stroke:#0288d1
Loading

Key decisions

  1. op_block_list=None preserves ORT defaults — ORT's DEFAULT_OP_BLOCK_LIST contains 24 ops known to be numerically unsafe in FP16 (TopK, CumSum, NonMaxSuppression, etc.). Passing [] would bypass this safety net.
  2. --fp16-keep-io-types defaults to True — model I/O stays FP32 by inserting Cast nodes at boundaries. This ensures compatibility with inference runtimes that feed float32 tensors.
  3. Node count logged before in-place mutation — ORT's converter mutates the model in-place, so we capture len(model.graph.node) before calling it.
  4. Already-FP16 skip — if all floating-point initializers are already FLOAT16, conversion is skipped with a log message.

CLI Flags

Command Flags
optimize --precision fp16, --fp16-keep-io-types / --no-fp16-keep-io-types, --fp16-op-block-list
build --precision fp16
export --precision fp16

Sample Usage & Output

Basic FP16 optimization

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16
Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
Converting to FP16...
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 4 (-100.0% reduction)

Node count increases because Cast nodes are inserted at I/O boundaries when --fp16-keep-io-types is enabled (default).

Verbose output with logging

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 -v
Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
[INFO winml.modelkit.optim.optimizer] Running shape inference (pre-stage)...
[INFO winml.modelkit.optim.optimizer] ✔ Shape inference (pre-stage) completed in 4.25s
[INFO winml.modelkit.optim.optimizer] Starting optimization pipeline (4 pipes)...
[INFO winml.modelkit.optim.optimizer] ⚙ Executing ort_graph...
[INFO winml.modelkit.optim.optimizer] ✔ ort_graph completed in 0.03s
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping rewrite (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping ort_fusion (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping surgery (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] Running shape inference...
[INFO winml.modelkit.optim.optimizer] ✔ Shape inference completed in 0.00s
Converting to FP16...
[INFO winml.modelkit.optim.fp16] Converting model to FP16...
[INFO winml.modelkit.optim.fp16]   Keeping I/O types as FP32
[INFO winml.modelkit.optim.fp16] FP16 conversion complete: 2 -> 4 nodes
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 4 (-100.0% reduction)

FP16 without preserving I/O types

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 --no-fp16-keep-io-types
Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
Converting to FP16...
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 2 (0.0% reduction)

No Cast nodes inserted — model I/O uses FP16 directly. Node count stays the same.

Mixed precision (block specific ops)

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 \
    --fp16-op-block-list LayerNorm,Softmax

Build pipeline with FP16

$ winml build microsoft/resnet-50 --precision fp16

Export with FP16

$ winml export microsoft/resnet-50 --precision fp16

Files Changed

File Change
src/winml/modelkit/optim/fp16.py NEWconvert_to_fp16() utility
src/winml/modelkit/commands/optimize.py Added --precision fp16 + fine-control flags
src/winml/modelkit/commands/build.py Added --precision fp16 stage
src/winml/modelkit/commands/export.py Added --precision fp16 post-export
src/winml/modelkit/utils/cli.py Shared precision_option() decorator
tests/unit/optim/test_fp16.py NEW — 7 unit tests

@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner June 11, 2026 03:04
Comment thread tests/unit/optim/pipes/test_pipe_fp16.py Fixed
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 8f5a1d2 to 9e7d8fd Compare June 11, 2026 04:15
@DingmaomaoBJTU DingmaomaoBJTU changed the title feat: add --enable-fp16-conversion to winml optimize feat: add --precision fp16 to optimize, build, and export commands Jun 11, 2026
Comment thread tests/unit/optim/test_fp16.py Fixed
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 9e7d8fd to 7d7a0ae Compare June 11, 2026 04:22


if TYPE_CHECKING:
import onnx
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 7d7a0ae to 328b5ab Compare June 11, 2026 04:32

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three findings on PR #872.

🤖 Generated with GitHub Copilot CLI

Comment thread src/winml/modelkit/commands/build.py Outdated
current_path = _run_fp16_stage(
model_path=current_path,
stage_timings=stage_timings,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline order is inconsistent between the pytorch and ONNX build paths.

In _build_pytorch_pipeline (here) FP16 runs between Export and Optimize, but elsewhere FP16 runs after Optimize:

  • _build_onnx_pipeline (line ~1547): Optimize → FP16 → Quantize
  • Standalone winml optimize --precision fp16 (optimize.py:435): converts to FP16 after optimizer.optimize(...) returns
  • The PR description's mermaid diagram: Export → Optimize → FP16 → Quantize → Compile

Practical impact: winml build -m hf-model --precision fp16 will hand an FP16 graph to the optimizer, while winml build -c cfg.json -m model.onnx --precision fp16 runs optimize on FP32 first. Two different graphs out for the same logical request, depending on input format.

Suggest moving the FP16 stage to run after _run_optimize_stage so both pipelines match the documented order.

🤖 Generated with GitHub Copilot CLI

Comment thread src/winml/modelkit/commands/build.py Outdated
elapsed = time.monotonic() - t0
sl.set_done(elapsed)
sl.detail("[dim]I/O types preserved as FP32[/dim]")
sl.artifact(str(model_path), 0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sl.artifact(str(model_path), 0) hardcodes the size to 0. Every other stage in this file passes _safe_size(...) (see lines 1121, 1249, 1322, 1421), so the FP16 stage will render as 0 B in the stage summary. Should be:

sl.artifact(str(model_path), _safe_size(model_path))

🤖 Generated with GitHub Copilot CLI

Comment thread tests/unit/optim/test_fp16.py Outdated
import onnx
from onnx import TensorProto, numpy_helper

from winml.modelkit.optim.fp16 import convert_to_fp16

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per CLAUDE.md, test code must use absolute imports at the package level, not reach into internal submodules for non-_-prefixed symbols. This line imports from the deep submodule winml.modelkit.optim.fp16, but convert_to_fp16 is a new public function used by three commands (build, optimize, export).

Suggest exporting it from src/winml/modelkit/optim/__init__.py (add to both the imports and __all__) and changing this import to:

from winml.modelkit.optim import convert_to_fp16

🤖 Generated with GitHub Copilot CLI

@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 328b5ab to b859627 Compare June 11, 2026 05:26
from __future__ import annotations

import numpy as np
import onnx
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from b859627 to 837330d Compare June 11, 2026 06:57
Add FP16 precision conversion support across all model pipeline commands:

- Create optim/fp16.py with convert_to_fp16() utility (wraps ORT float16)
- optimize: --precision fp16 with --fp16-keep-io-types and --fp16-op-block-list
- build: --precision fp16 stage between optimize and quantize
- export: --precision fp16 as post-export conversion
- Add shared precision_option() CLI decorator in utils/cli.py

Design: FP16 is a precision transformation (not a graph optimization), so it
lives as a command-layer utility rather than an optimizer pipe. All three
commands share the same convert_to_fp16() function.

Fixes #867
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 837330d to fede96c Compare June 11, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add --enable-fp16-conversion to winml optimize and --precision to winml build/export

3 participants