Skip to content

[quantization] Complete QuantGemma4Model PTQ Wrapper#797

Open
dvsav wants to merge 1 commit into
Samsung:mainfrom
dvsav:model
Open

[quantization] Complete QuantGemma4Model PTQ Wrapper#797
dvsav wants to merge 1 commit into
Samsung:mainfrom
dvsav:model

Conversation

@dvsav

@dvsav dvsav commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What

This PR completes the QuantGemma4Model PTQ wrapper — the top-level multimodal (image-text) model for Gemma4 E2B. It implements the full calibration forward path with multimodal placeholder replacement, PLE computation, and fixed-slot fusion, plus an export-friendly forward_export() and as_export_module() for Circle conversion.

Why

The QuantGemma4Model wrapper previously had a skeleton forward() that only handled the simplest text-only path — multimodal placeholder tokens (image/video/audio) were not replaced before embedding, PLE was not computed, and the wrapper was not registered for automatic discovery by prepare(). Without these, calibration through the full model was incorrect (wrong embeddings at image positions, no PLE statistics) and the wrapper was never instantiated. Additionally, there was no export path to convert the quantized model to Circle format, which is the final deliverable of the TICO pipeline.

Key Design Decisions

  1. Placeholder replacement via torch.where instead of masked_scatter: The original HuggingFace Gemma4Model.forward uses masked_scatter to insert image features, which is not export-friendly. We replace placeholder token IDs with pad_token_id before embedding and use fixed_slot_fuse to insert image features at a static position range. This matches the StaticGemma4Runtime design where CPU owns dynamic operations and NPU owns static compute.

  2. Two forward methods — forward() for calibration, forward_export() for export: The calibration forward() contains dynamic control flow (if pixel_values is not None, if multimodal_mask.any(), conditional PLE) that cannot be exported. The export forward_export() takes pre-fused inputs_embeds and precomputed masks/RoPE/PLE, with no dynamic control flow — following the pattern established by QuantGemma4VisionModel.

  3. CPU/NPU split aligned with StaticGemma4Runtime: The export path assumes the CPU runtime has already performed token embedding, vision tower, MM fusion, PLE computation, and mask/RoPE generation. The NPU subgraph runs only the text decoder layers and final norm.

Changes

  • tico/quantization/wrapq/wrappers/gemma4/quant_model.py — Added _get_placeholder_mask() helper; completed forward() with placeholder replacement, PLE computation, and input validation; added forward_export() (static-shape export path) and as_export_module() (export preparation)
  • tico/quantization/wrapq/wrappers/gemma4/export_adapters.py — Added Gemma4ModelPrefillExportAdapter that delegates to forward_export()
  • tico/quantization/wrapq/wrappers/registry.py — Enabled quant_model in _CORE_MODULES for automatic discovery
  • tico/quantization/wrapq/examples/gemma4/quantize_model.py — New example script demonstrating full PTQ flow (text-only + image-text calibration, PEIR evaluation, Circle export)
  • test/quantization/wrapq/wrappers/gemma4/test_quant_model.py — New unit tests for _get_placeholder_mask and QuantGemma4Model wrapper
  • test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py — New smoke tests for prepare-calibrate-convert flow (text-only, image-text, export adapter)
  • tico/quantization/recipes/debug/wrapper_smoke/cases/gemma4.py — Added Gemma4ModelCase and registered it in GEMMA4_CASES

Tests

  • test_quantize_model.py: 5 smoke tests — no-quant parity, prepare-convert text-only flow, prepare-convert image-text flow, as_export_module flow, _get_placeholder_mask unit test
  • test_quant_model.py: Unit tests for QuantGemma4Model wrapper and helper functions
  • Wrapper smoke runner: Gemma4ModelCase passes with Mean |diff| = 0.006353, PEIR = 0.028361, and successful Circle export
  • All 149 tests in test/quantization/wrapq/wrappers/gemma4/ pass

Unit Tests

$ python -m pytest test/quantization/wrapq/wrappers/gemma4/test_quant_model.py -v 2>&1
========================================================= test session starts ==========================================================
platform linux -- Python 3.10.12, pytest-9.0.3, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 12 items                                                                                                                     

test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_image_placeholders                                    PASSED [  8%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_missing_token_ids                                     PASSED [ 16%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_mixed_placeholders                                    PASSED [ 25%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_no_placeholders                                       PASSED [ 33%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_both_input_ids_and_inputs_embeds       PASSED [ 41%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_input_ids_with_per_layer_inputs        PASSED [ 50%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_neither_input_ids_nor_inputs_embeds    PASSED [ 58%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_inputs_embeds_with_per_layer_inputs_works          PASSED [ 66%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_inputs_embeds_without_per_layer_inputs_raises      PASSED [ 75%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_placeholder_replacement_produces_pad_embedding     PASSED [ 83%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_prepare_convert_flow                               PASSED [ 91%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_text_only_forward_is_finite                        PASSED [100%]

=================================================================== 12 passed in 48.35s ===================================================================

Internal Tests

$ RUN_INTERNAL_TESTS=1 python -m pytest test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py -v
=================================================================== test session starts ===================================================================
platform linux -- Python 3.10.12, pytest-9.0.3, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 5 items                                                                                                                                         

test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_as_export_module_flow                    PASSED [ 20%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_get_placeholder_mask                     PASSED [ 40%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_no_quant_model_matches_reference         PASSED [ 60%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_prepare_convert_model_flow               PASSED [ 80%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_prepare_convert_model_flow_with_image    PASSED [100%]

=================================================================== 5 passed in 34.18s ====================================================================

Smoke Test

$ python -m tico.quantization.examples.inspect \
    --config tico/quantization/examples/configs/wrapper_smoke.yaml \
    --mode wrapper-smoke \
    --case gemma4_model \
    --export circle \
    --output-dir ./out/wrapper_smoke
┌───────────── Wrapper Smoke Summary ─────────────
│ Case             : gemma4_model
│ Status           : PASS
│ Mean |diff|      : 0.006353
│ Max |diff|       : 0.169459
│ PEIR             : 0.028361
│ Shape match      : True
│ Quant finite     : True
└─────────────────────────────────────────────────
Artifacts:
  - circle: out/wrapper_smoke/gemma4_model.q.circle
    ┌────────────────────────────────────────────┐
 3.1┤                                            │
    │                                        ••  │
    │                                      •••   │
 2.0┤                                    •••     │
    │                                  •••       │
    │                               ••••         │
    │                             ••••           │
 0.9┤                            •••             │
    │                          •••               │
    │                      • •••                 │
-0.2┤                      •••                   │
    │                    ••                      │
    │                 •••                        │
    │               •••                          │
-1.3┤             •••                            │
    │           •••                              │
    │          ••                                │
-2.4┤       •••                                  │
    │      •                                     │
    │    ••                                      │
    │  •                                         │
-3.5┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.5       -1.8       -0.2       1.5       3.1 

Example Script

tico/quantization/wrapq/examples/gemma4/quantize_model.py demonstrates the complete workflow:

  1. Creates a tiny Gemma4Model with random weights (no download)
  2. Prepares with build_gemma4_e2b_ptq_config()
  3. Calibrates with text-only data (20 samples) and image-text data (20 samples)
  4. Converts to fake-quantized model and compares FP vs. quantized outputs (PEIR)
  5. Exports via as_export_module("prefill") and converts to Circle format (gemma4_model.q.circle)
$ python tico/quantization/wrapq/examples/gemma4/quantize_model.py
Preparing model for quantization...
Calibrating (text-only)...
Converting to quantized model...

┌───────────── Quantization Error Summary ─────────────
│ FP output shape    : (1, 16, 64)
│ Quant output shape : (1, 16, 64)
│ Mean |diff|        : 0.000134
│ PEIR               : 0.009668 %
└──────────────────────────────────────────────────────
     ┌───────────────────────────────────────────┐
 2.25┤                                           │
     │                                        •  │
     │                                      •    │
 1.46┤                                    •      │
     │                                  •        │
     │                               ••          │
     │                             •••           │
 0.67┤                           •••             │
     │                         ••                │
     │                       •••                 │
-0.12┤                     ••                    │
     │                   •••                     │
     │                  •                        │
     │               •••                         │
-0.91┤              ••                           │
     │           •••                             │
     │                                           │
-1.70┤        ••                                 │
     │       •                                   │
     │    ••                                     │
     │  ••                                       │
-2.49┤                                           │
     └┬──────────┬─────────┬──────────┬─────────┬┘
    -2.5       -1.3      -0.1        1.1      2.3 


Calibrating with image-text data...

┌───────────── Image-Text Quantization Error ──────────
│ FP output shape    : (1, 16, 64)
│ Quant output shape : (1, 16, 64)
│ Mean |diff|        : 0.000280
│ PEIR               : 0.028300 %
└──────────────────────────────────────────────────────

Exporting to Circle format...
Export output shape: (1, 16, 64)
Converting to Circle format...
[QuantCheck] WARNING: 2 nodes without qparam detected (see logs).
Circle model saved as 'gemma4_model.q.circle'

@dvsav dvsav marked this pull request as ready for review June 30, 2026 13:20
@dvsav dvsav force-pushed the model branch 2 times, most recently from 1a91bb5 to a44dcbc Compare June 30, 2026 13:34
…lacement and PLE

Add multimodal placeholder token replacement, PLE computation, and input validation to QuantGemma4Model.forward.

Co-authored-by: Cline

TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
@dvsav dvsav requested review from Torrero and mhs4670go June 30, 2026 13:41
@dvsav dvsav changed the title [quantization] Complete QuantGemma4Model forward with placeholder replaceme… [quantization] Complete QuantGemma4Model PTQ Wrapper Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant