[AMD] Add DSv4 FP4 MI355X vLLM benchmark by Oseltamivir · Pull Request #1282 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-05T04:16:38Z

Summary

Add benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh following the recipes.vllm.ai DeepSeek-V4-Pro AMD ROCm page ([New Model][ROCm] Add AMD support for DeepSeek V4 vllm-project/vllm#40871 base + #41217 MLA Indexer optimization).
Uses rocm/vllm-dev:deepseek-v4-latest directly — no runtime PR clone / patch / pip rebuild (unlike the existing dsv4-fp8-mi355x-vllm overlay path, which is built off PR #40889 stacked on #40871).
FP4 path: same deepseek-ai/DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; --moe-backend triton_unfused consumes FP4 expert weights directly. KV cache stays fp8.

Recipe flags (from the docs, MI355X TP=8 validated):

VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1
--gpu-memory-utilization 0.6
--max-num-seqs 128 (auto-grows past CONC), --max-num-batched-tokens 8192
--distributed-executor-backend mp, --async-scheduling, --enforce-eager
--tokenizer-mode deepseek_v4, --tool-call-parser deepseek_v4 --enable-auto-tool-choice, --reasoning-parser deepseek_v4

Wires up dsv4-fp4-mi355x-vllm in .github/configs/amd-master.yaml with TP=8, CONC 1-128 for both 1k1k and 8k1k. Adds a perf-changelog.yaml entry (pr-link is TBD — fill in when this number is known).

Test plan

CI lints / yaml-load: python -c "import yaml; yaml.safe_load(open('.github/configs/amd-master.yaml'))" and same on perf-changelog.yaml (already verified locally).
bash -n benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh (already verified locally).
Trigger the new sweep on MI355X (TP=8, 1k1k conc=1) and confirm wait_for_server_ready succeeds against rocm/vllm-dev:deepseek-v4-latest.
Run gsm8k with RUN_EVAL=true and confirm exact-match parity with the docs (~0.954 flexible-extract).
Expand to conc=128 once conc=1 is healthy.

🤖 Generated with Claude Code

functionstackx · 2026-05-05T04:18:38Z

+# triton_unfused MoE backend consumes the FP4 expert weights from the
+# DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; KV cache stays fp8.
+dsv4-fp4-mi355x-vllm:
+  image: rocm/vllm-dev:deepseek-v4-latest


@Oseltamivir can u add the specific SHA like we do for all the other images?

c898aeb#diff-1bd56add5a769a3c46eaffc11bbbccc4d8546cf7c1a0877b735dc46581ac8405R2637

claude · 2026-05-05T04:24:22Z

+# triton_unfused MoE backend consumes the FP4 expert weights from the
+# DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; KV cache stays fp8.
+dsv4-fp4-mi355x-vllm:
+  image: rocm/vllm-dev:deepseek-v4-latest


🔴 The new dsv4-fp4-mi355x-vllm entry uses image: rocm/vllm-dev:deepseek-v4-latest — the only -latest-style tag in any master config. Every other vLLM image is version-pinned (v0.16.0–v0.19.1), and the sibling dsv4-fp4-mi355x-sglang entry right above is SHA+date-pinned (rocm/sgl-dev:rocm720-mi35x-a8410de-20260502-DSv4). A mutable tag breaks the perf-changelog attribution model — upstream rebuilds will silently shift benchmark results without a corresponding config commit. Suggest pinning to a dated/SHA tag (e.g., rocm/vllm-dev:deepseek-v4-<sha-or-date>) mirroring the sibling sglang convention.

Extended reasoning...

What the bug is. The new dsv4-fp4-mi355x-vllm block at amd-master.yaml:1645 sets image: rocm/vllm-dev:deepseek-v4-latest. The trailing -latest makes this a mutable tag — the registry can re-publish the same tag pointing at a new image SHA at any time without any change to this repo.\n\nWhy this is out of pattern. Grepping every other vLLM image in amd-master.yaml shows pinned tags exclusively: vllm/vllm-openai-rocm:v0.16.0, v0.17.0, v0.18.0, v0.19.0, v0.19.1, plus rocm/atom:…atom0.1.2.post for dsv4-fp8-mi355x-vllm. More tellingly, the directly-adjacent dsv4-fp4-mi355x-sglang entry uses rocm/sgl-dev:rocm720-mi35x-a8410de-20260502-DSv4 and its YAML comment explicitly says: "the SHA is encoded in the image tag, so bumping sglang is just an image tag bump here." That is the convention this PR is breaking.\n\nWhy existing code does not prevent it. Nothing in the harness validates tag immutability — the YAML is fed to the sweep runner as-is. The perf-changelog.yaml mechanism (this PR adds an entry at lines 2210–2220) attributes performance changes to config-key bumps — i.e., to git commits that change the image tag. With :deepseek-v4-latest there is no future commit when the upstream image is rebuilt; the changelog records numbers against a tag whose contents can drift.\n\nImpact. The YAML comment claims "the image already ships the validated build" and the PR description plans a gsm8k parity check (~0.954 flexible-extract). Both claims are anchored to whatever image happens to live behind :deepseek-v4-latest at the moment of validation. After that, any upstream re-push silently invalidates both the recorded gsm8k number and the per-sweep latency numbers, with no commit/PR to attribute the regression (or improvement) to. Mid-sweep regressions become unattributable, which is precisely the failure mode the perf-changelog workflow was designed to prevent.\n\nStep-by-step proof.\n1. PR #1282 lands; :deepseek-v4-latest resolves to image SHA A. Sweep runs, perf-changelog records numbers N_A.\n2. Upstream rebuilds rocm/vllm-dev:deepseek-v4-latest to fix an unrelated CVE. Tag now resolves to SHA B. No commit lands in this repo.\n3. Next nightly sweep pulls SHA B, produces numbers N_B \neq N_A. Bisecting attribution against git log finds no corresponding config-key change — the perf delta is invisible to the changelog model.\n4. Contrast with the sibling dsv4-fp4-mi355x-sglang entry: bumping its image requires changing the a8410de-20260502 portion of the tag, which is a commit in this repo and a perf-changelog entry.\n\nHow to fix. Replace rocm/vllm-dev:deepseek-v4-latest with a SHA- or date-pinned tag that the recipe authors publish (e.g., rocm/vllm-dev:deepseek-v4-<sha> or …-<YYYYMMDD>), mirroring the sibling sglang convention. Future image bumps then arrive as explicit config-key edits, preserving perf-changelog attribution. Severity is normal — reproducibility/attribution is a stated goal of this directory (per AGENTS.md), and there is precedent immediately above for the right pattern.

claude · 2026-05-05T04:24:22Z

+    - "FP4 path: same DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; --moe-backend triton_unfused consumes the FP4 expert weights directly. KV cache stays fp8"
+    - "Recipe flags from the docs: VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1, --gpu-memory-utilization 0.6, --max-num-seqs 128 (auto-grows past CONC), --max-num-batched-tokens 8192, --distributed-executor-backend mp, --async-scheduling, --enforce-eager, --tokenizer-mode/reasoning-parser/tool-call-parser deepseek_v4"
+    - "Search space: TP=8, CONC 1-128 for both 1k1k and 8k1k"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD


🟡 The new perf-changelog.yaml entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD (line 2219), which will resolve to a 404 once merged. Every other entry in this file uses a real numeric PR id; please replace TBD with 1282 (this PR's number) before merging.

Extended reasoning...

The bug

In perf-changelog.yaml line 2219, the new entry for dsv4-fp4-mi355x-vllm is given a placeholder pr-link:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD

TBD is not a valid PR number, so this URL produces a 404 in any tooling or UI that follows it. Every other entry in the file (for example the immediately-preceding entries on lines 2189, 2199, 2209) uses a real numeric PR id like 1256, 1065, or 1222.

Why existing checks don't catch it

The PR description itself flags this as a known TODO ("pr-link is TBD — fill in when this number is known"), so the author plans to fix it manually before merge. However, there is no CI gate that enforces this: the changelog schema in utils/matrix_logic/validation.py only validates that pr-link is a string, not that it is a valid URL or that it resolves to an existing PR. AGENTS.md documents using XXX/TBD as draft placeholders, but with no automated check the placeholder can silently slip through and become permanent in the merged changelog.

Impact

Low blast radius — the changelog is metadata, so no benchmark execution is affected. The cost is purely a broken hyperlink baked into git history, which any tool that walks the changelog (release notes generation, UI dashboards, manual auditors clicking through entries) will hit as a 404.

Step-by-step proof

Open perf-changelog.yaml at line 2219 — the value is the literal string https://github.com/SemiAnalysisAI/InferenceX/pull/TBD.

Compare to the most recent prior entry at line 2209: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1222 — a valid numeric id.

Resolving /pull/TBD against any GitHub repo returns HTTP 404 because GitHub PR routes accept only integers.

Grep utils/matrix_logic/validation.py for the pr-link schema entry — the type check is str, with no regex/URL validator. So yaml.safe_load succeeds and validate succeeds even with TBD.

Hence merging the PR as-is permanently encodes the broken link.

Fix

Replace TBD with 1282 (this PR's number) on line 2219:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1282

Optionally, a follow-up could harden validation.py to require pr-link to match https://github\.com/.+/pull/\d+$ so future placeholders fail CI.

New benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh follows the recipes.vllm.ai DeepSeek-V4-Pro AMD ROCm page (vllm-project/vllm#40871 base + #41217 MLA Indexer). Uses rocm/vllm-dev:deepseek-v4-latest directly so no runtime PR clone or pip rebuild is required, unlike the existing dsv4-fp8-mi355x-vllm overlay path. FP4 path: same DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; --moe-backend triton_unfused consumes FP4 expert weights; KV cache fp8. Recipe flags: VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1, --gpu-memory-utilization 0.6, --max-num-seqs 128 (auto-grows past CONC), --max-num-batched-tokens 8192, --distributed-executor-backend mp, --async-scheduling, --enforce-eager, deepseek_v4 tokenizer / reasoning / tool-call parsers. Adds: - benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh - .github/configs/amd-master.yaml: dsv4-fp4-mi355x-vllm (TP=8, conc 1-128 for 1k1k and 8k1k) - perf-changelog.yaml entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@sha256

rocm/vllm-dev:deepseek-v4-latest is a moving tag; pin to the digest we intend to validate against so reruns are reproducible. Matches the @sha256: format already used by the nvidia-master.yaml DSv4 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-05T06:12:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25359905080
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25359905080

triton_unfused is not in the allowed list for FP4/FP8 MoE (ValueError: moe_backend='triton_unfused' is not supported). Switch to --moe-backend aiter which handles MXFP4 expert weights natively on ROCm gfx950 via the AITER stack already enabled by VLLM_ROCM_USE_AITER=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-05T07:41:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25363980691
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25363980691

github-actions · 2026-05-05T07:46:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25363984774
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25363984774

Oseltamivir requested a review from a team May 5, 2026 04:16

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 5, 2026 04:16

github-project-automation Bot added this to InferenceMAX Board May 5, 2026

functionstackx reviewed May 5, 2026

View reviewed changes

claude Bot reviewed May 5, 2026

View reviewed changes

Oseltamivir and others added 2 commits May 5, 2026 01:40

functionstackx force-pushed the dsv4-fp4-mi355x-vllm branch from df168f4 to a13d6a6 Compare May 5, 2026 05:41

functionstackx added the full-sweep-enabled label May 5, 2026

Oseltamivir added sweep-enabled and removed full-sweep-enabled labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add DSv4 FP4 MI355X vLLM benchmark#1282

[AMD] Add DSv4 FP4 MI355X vLLM benchmark#1282
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-fp4-mi355x-vllm

Oseltamivir commented May 5, 2026

Uh oh!

functionstackx May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented May 5, 2026

Summary

Test plan

Uh oh!

functionstackx May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants