[quantization] Complete QuantGemma4Model PTQ Wrapper by dvsav · Pull Request #797 · Samsung/TICO

dvsav · 2026-06-30T09:21:24Z

What

This PR completes the QuantGemma4Model PTQ wrapper — the top-level multimodal (image-text) model for Gemma4 E2B. It implements the full calibration forward path with multimodal placeholder replacement, PLE computation, and fixed-slot fusion, plus an export-friendly forward_export() and as_export_module() for Circle conversion.

Why

The QuantGemma4Model wrapper previously had a skeleton forward() that only handled the simplest text-only path — multimodal placeholder tokens (image/video/audio) were not replaced before embedding, PLE was not computed, and the wrapper was not registered for automatic discovery by prepare(). Without these, calibration through the full model was incorrect (wrong embeddings at image positions, no PLE statistics) and the wrapper was never instantiated. Additionally, there was no export path to convert the quantized model to Circle format, which is the final deliverable of the TICO pipeline.

Key Design Decisions

Placeholder replacement via torch.where instead of masked_scatter: The original HuggingFace Gemma4Model.forward uses masked_scatter to insert image features, which is not export-friendly. We replace placeholder token IDs with pad_token_id before embedding and use fixed_slot_fuse to insert image features at a static position range. This matches the StaticGemma4Runtime design where CPU owns dynamic operations and NPU owns static compute.
Two forward methods — forward() for calibration, forward_export() for export: The calibration forward() contains dynamic control flow (if pixel_values is not None, if multimodal_mask.any(), conditional PLE) that cannot be exported. The export forward_export() takes pre-fused inputs_embeds and precomputed masks/RoPE/PLE, with no dynamic control flow — following the pattern established by QuantGemma4VisionModel.
CPU/NPU split aligned with StaticGemma4Runtime: The export path assumes the CPU runtime has already performed token embedding, vision tower, MM fusion, PLE computation, and mask/RoPE generation. The NPU subgraph runs only the text decoder layers and final norm.

Changes

tico/quantization/wrapq/wrappers/gemma4/quant_model.py — Added _get_placeholder_mask() helper; completed forward() with placeholder replacement, PLE computation, and input validation; added forward_export() (static-shape export path) and as_export_module() (export preparation)
tico/quantization/wrapq/wrappers/gemma4/export_adapters.py — Added Gemma4ModelPrefillExportAdapter that delegates to forward_export()
tico/quantization/wrapq/wrappers/registry.py — Enabled quant_model in _CORE_MODULES for automatic discovery
tico/quantization/wrapq/examples/gemma4/quantize_model.py — New example script demonstrating full PTQ flow (text-only + image-text calibration, PEIR evaluation, Circle export)
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py — New unit tests for _get_placeholder_mask and QuantGemma4Model wrapper
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py — New smoke tests for prepare-calibrate-convert flow (text-only, image-text, export adapter)
tico/quantization/recipes/debug/wrapper_smoke/cases/gemma4.py — Added Gemma4ModelCase and registered it in GEMMA4_CASES

Tests

test_quantize_model.py: 5 smoke tests — no-quant parity, prepare-convert text-only flow, prepare-convert image-text flow, as_export_module flow, _get_placeholder_mask unit test
test_quant_model.py: Unit tests for QuantGemma4Model wrapper and helper functions
Wrapper smoke runner: Gemma4ModelCase passes with Mean |diff| = 0.006353, PEIR = 0.028361, and successful Circle export
All 149 tests in test/quantization/wrapq/wrappers/gemma4/ pass

Unit Tests

$ python -m pytest test/quantization/wrapq/wrappers/gemma4/test_quant_model.py -v 2>&1
========================================================= test session starts ==========================================================
platform linux -- Python 3.10.12, pytest-9.0.3, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 12 items                                                                                                                     

test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_image_placeholders                                    PASSED [  8%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_missing_token_ids                                     PASSED [ 16%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_mixed_placeholders                                    PASSED [ 25%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestGetPlaceholderMask::test_no_placeholders                                       PASSED [ 33%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_both_input_ids_and_inputs_embeds       PASSED [ 41%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_input_ids_with_per_layer_inputs        PASSED [ 50%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelValidation::test_reject_neither_input_ids_nor_inputs_embeds    PASSED [ 58%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_inputs_embeds_with_per_layer_inputs_works          PASSED [ 66%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_inputs_embeds_without_per_layer_inputs_raises      PASSED [ 75%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_placeholder_replacement_produces_pad_embedding     PASSED [ 83%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_prepare_convert_flow                               PASSED [ 91%]
test/quantization/wrapq/wrappers/gemma4/test_quant_model.py::TestQuantGemma4ModelSmoke::test_text_only_forward_is_finite                        PASSED [100%]

=================================================================== 12 passed in 48.35s ===================================================================

Internal Tests

$ RUN_INTERNAL_TESTS=1 python -m pytest test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py -v
=================================================================== test session starts ===================================================================
platform linux -- Python 3.10.12, pytest-9.0.3, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 5 items                                                                                                                                         

test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_as_export_module_flow                    PASSED [ 20%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_get_placeholder_mask                     PASSED [ 40%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_no_quant_model_matches_reference         PASSED [ 60%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_prepare_convert_model_flow               PASSED [ 80%]
test/quantization/wrapq/wrappers/gemma4/test_quantize_model.py::TestGemma4ModelSmoke::test_prepare_convert_model_flow_with_image    PASSED [100%]

=================================================================== 5 passed in 34.18s ====================================================================

Smoke Test

$ python -m tico.quantization.examples.inspect \
    --config tico/quantization/examples/configs/wrapper_smoke.yaml \
    --mode wrapper-smoke \
    --case gemma4_model \
    --export circle \
    --output-dir ./out/wrapper_smoke
┌───────────── Wrapper Smoke Summary ─────────────
│ Case             : gemma4_model
│ Status           : PASS
│ Mean |diff|      : 0.006353
│ Max |diff|       : 0.169459
│ PEIR             : 0.028361
│ Shape match      : True
│ Quant finite     : True
└─────────────────────────────────────────────────
Artifacts:
  - circle: out/wrapper_smoke/gemma4_model.q.circle
    ┌────────────────────────────────────────────┐
 3.1┤                                            │
    │                                        ••  │
    │                                      •••   │
 2.0┤                                    •••     │
    │                                  •••       │
    │                               ••••         │
    │                             ••••           │
 0.9┤                            •••             │
    │                          •••               │
    │                      • •••                 │
-0.2┤                      •••                   │
    │                    ••                      │
    │                 •••                        │
    │               •••                          │
-1.3┤             •••                            │
    │           •••                              │
    │          ••                                │
-2.4┤       •••                                  │
    │      •                                     │
    │    ••                                      │
    │  •                                         │
-3.5┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.5       -1.8       -0.2       1.5       3.1

Example Script

tico/quantization/wrapq/examples/gemma4/quantize_model.py demonstrates the complete workflow:

Creates a tiny Gemma4Model with random weights (no download)
Prepares with build_gemma4_e2b_ptq_config()
Calibrates with text-only data (20 samples) and image-text data (20 samples)
Converts to fake-quantized model and compares FP vs. quantized outputs (PEIR)
Exports via as_export_module("prefill") and converts to Circle format (gemma4_model.q.circle)

$ python tico/quantization/wrapq/examples/gemma4/quantize_model.py
Preparing model for quantization...
Calibrating (text-only)...
Converting to quantized model...

┌───────────── Quantization Error Summary ─────────────
│ FP output shape    : (1, 16, 64)
│ Quant output shape : (1, 16, 64)
│ Mean |diff|        : 0.000134
│ PEIR               : 0.009668 %
└──────────────────────────────────────────────────────
     ┌───────────────────────────────────────────┐
 2.25┤                                           │
     │                                        •  │
     │                                      •    │
 1.46┤                                    •      │
     │                                  •        │
     │                               ••          │
     │                             •••           │
 0.67┤                           •••             │
     │                         ••                │
     │                       •••                 │
-0.12┤                     ••                    │
     │                   •••                     │
     │                  •                        │
     │               •••                         │
-0.91┤              ••                           │
     │           •••                             │
     │                                           │
-1.70┤        ••                                 │
     │       •                                   │
     │    ••                                     │
     │  ••                                       │
-2.49┤                                           │
     └┬──────────┬─────────┬──────────┬─────────┬┘
    -2.5       -1.3      -0.1        1.1      2.3 


Calibrating with image-text data...

┌───────────── Image-Text Quantization Error ──────────
│ FP output shape    : (1, 16, 64)
│ Quant output shape : (1, 16, 64)
│ Mean |diff|        : 0.000280
│ PEIR               : 0.028300 %
└──────────────────────────────────────────────────────

Exporting to Circle format...
Export output shape: (1, 16, 64)
Converting to Circle format...
[QuantCheck] WARNING: 2 nodes without qparam detected (see logs).
Circle model saved as 'gemma4_model.q.circle'

…lacement and PLE Add multimodal placeholder token replacement, PLE computation, and input validation to QuantGemma4Model.forward. Co-authored-by: Cline TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

dvsav mentioned this pull request Jun 30, 2026

[quantization] Support Gemma4 #768

Open

dvsav force-pushed the model branch from 159becf to 6356c05 Compare June 30, 2026 11:01

dvsav marked this pull request as ready for review June 30, 2026 13:20

dvsav force-pushed the model branch 2 times, most recently from 1a91bb5 to a44dcbc Compare June 30, 2026 13:34

dvsav force-pushed the model branch from a44dcbc to 6b006d2 Compare June 30, 2026 13:35

dvsav requested review from Torrero and mhs4670go June 30, 2026 13:41

dvsav changed the title ~~[quantization] Complete QuantGemma4Model forward with placeholder replaceme…~~ [quantization] Complete QuantGemma4Model PTQ Wrapper Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quantization] Complete QuantGemma4Model PTQ Wrapper#797

[quantization] Complete QuantGemma4Model PTQ Wrapper#797
dvsav wants to merge 1 commit into
Samsung:mainfrom
dvsav:model

dvsav commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dvsav commented Jun 30, 2026

What

Why

Key Design Decisions

Changes

Tests

Unit Tests

Internal Tests

Smoke Test

Example Script

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant