Add Qwen 3.5 MoE to cuda-perf CI and add prefill throughput tracking#18903
Add Qwen 3.5 MoE to cuda-perf CI and add prefill throughput tracking#18903
Conversation
Gasoonjia
commented
Apr 15, 2026
- Add PyTorchObserver stats output to qwen3_5_moe runner (enables cuda_benchmark.py parsing), --prompt_file flag, and GPU memory stats
- Add prefill_throughput metric to cuda_benchmark.py (prefill tok/s alongside existing decode tok/s)
- Add Qwen3.5-35B-A3B-HQQ-INT4 to cuda-perf.yml with >1000 token prompt and 512 output tokens, on linux.aws.a100
- Align cuda-perf.yml triggers with cuda.yml (push main/release, ciflow/cuda tags, PR on backends/cuda and backends/aoti paths)
- Remove random model selection and schedule trigger; always run all models when triggered
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18903
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 7 New Failures, 4 Cancelled Jobs, 1 Unrelated FailureAs of commit 7336416 with merge base 28c56fe ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
- Add PyTorchObserver stats output to qwen3_5_moe runner (enables cuda_benchmark.py parsing), --prompt_file flag, and GPU memory stats - Add prefill_throughput metric to cuda_benchmark.py (prefill tok/s alongside existing decode tok/s) - Add Qwen3.5-35B-A3B-HQQ-INT4 to cuda-perf.yml with >1000 token prompt and 512 output tokens, on linux.aws.a100 - Align cuda-perf.yml triggers with cuda.yml (push main/release, ciflow/cuda tags, PR on backends/cuda and backends/aoti paths) - Remove random model selection and schedule trigger; always run all models when triggered
a23b5b1 to
49d0aa1
Compare
e50e3fa to
36213f9
Compare