Add CUDA-Windows quantization support to Parakeet README by seyeong-han · Pull Request #18895 · pytorch/executorch

seyeong-han · 2026-04-14T21:28:11Z

Document that the cuda-windows backend supports 4w/8w/8da4w/8da8w quantization configs for the Parakeet TDT model export, and add an export example.

Changes

Updated the Quantization Configs table to list CUDA-Windows alongside CUDA as a supported backend for 4w, 8w, 8da4w, and 8da8w quantization
Added a new "Example: 4-bit Weight Quantization for CUDA-Windows" section with the export command (--dtype bf16 --qlinear_encoder 4w --qlinear 4w --qembedding 8w)
Added a note that --qlinear_packing_format tile_packed_to_4d is not supported for cuda-windows because the cross-compilation path keeps tensors on CPU during export and aten::_convert_weight_to_int4pack only has a CUDA kernel

Benchmark Results (RTX 5080, ~20s audio)

Metric	Non-Quantized (fp32)	Quantized (bf16 + 4w/8w)	Improvement
Model size	2,445 MB	763 MB	3.2× smaller
Prefill speed	1,301 tok/s	3,218 tok/s	2.5× faster
Decode speed	1,325 tok/s	1,545 tok/s	1.2× faster
Model load time	4.1 s	1.6 s	2.5× faster
Time to first token	202 ms	86 ms	2.4× faster
Transcription accuracy	Baseline	Identical words	Tie

Document that CUDA-Windows backend supports 4w/8w/8da4w/8da8w quantization configs (without tile_packed_to_4d packing format). Add export example showing bf16 + 4w linear + 8w embedding quantization for CUDA-Windows. Note: tile_packed_to_4d is not supported for cuda-windows because the cross-compilation path keeps tensors on CPU during export, and the _convert_weight_to_int4pack op only has a CUDA kernel.

pytorch-bot · 2026-04-14T21:28:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18895

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

✅ No Failures

As of commit 7f25285 with merge base 38c5ca3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-14T21:28:58Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

seyeong-han requested a review from Gasoonjia April 14, 2026 21:28

seyeong-han requested a review from lucylq as a code owner April 14, 2026 21:28

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA-Windows quantization support to Parakeet README#18895

Add CUDA-Windows quantization support to Parakeet README#18895
seyeong-han wants to merge 1 commit intomainfrom
remote/younghan/parakeet-cuda-windows-quantization

seyeong-han commented Apr 14, 2026

Uh oh!

pytorch-bot bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seyeong-han commented Apr 14, 2026

Changes

Benchmark Results (RTX 5080, ~20s audio)

Uh oh!

pytorch-bot bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18895

❗ 1 Active SEVs

✅ No Failures

Uh oh!

github-actions bot commented Apr 14, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Apr 14, 2026 •

edited

Loading

This PR needs a `release notes:` label