Skip to content

[AMD] Add DSv4 FP4 MI355X vLLM benchmark#1282

Open
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-fp4-mi355x-vllm
Open

[AMD] Add DSv4 FP4 MI355X vLLM benchmark#1282
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-fp4-mi355x-vllm

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

Summary

  • Add benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh following the recipes.vllm.ai DeepSeek-V4-Pro AMD ROCm page ([New Model][ROCm] Add AMD support for DeepSeek V4 vllm-project/vllm#40871 base + #41217 MLA Indexer optimization).
  • Uses rocm/vllm-dev:deepseek-v4-latest directly — no runtime PR clone / patch / pip rebuild (unlike the existing dsv4-fp8-mi355x-vllm overlay path, which is built off PR #40889 stacked on #40871).
  • FP4 path: same deepseek-ai/DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; --moe-backend triton_unfused consumes FP4 expert weights directly. KV cache stays fp8.

Recipe flags (from the docs, MI355X TP=8 validated):

  • VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1
  • --gpu-memory-utilization 0.6
  • --max-num-seqs 128 (auto-grows past CONC), --max-num-batched-tokens 8192
  • --distributed-executor-backend mp, --async-scheduling, --enforce-eager
  • --tokenizer-mode deepseek_v4, --tool-call-parser deepseek_v4 --enable-auto-tool-choice, --reasoning-parser deepseek_v4

Wires up dsv4-fp4-mi355x-vllm in .github/configs/amd-master.yaml with TP=8, CONC 1-128 for both 1k1k and 8k1k. Adds a perf-changelog.yaml entry (pr-link is TBD — fill in when this number is known).

Test plan

  • CI lints / yaml-load: python -c "import yaml; yaml.safe_load(open('.github/configs/amd-master.yaml'))" and same on perf-changelog.yaml (already verified locally).
  • bash -n benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh (already verified locally).
  • Trigger the new sweep on MI355X (TP=8, 1k1k conc=1) and confirm wait_for_server_ready succeeds against rocm/vllm-dev:deepseek-v4-latest.
  • Run gsm8k with RUN_EVAL=true and confirm exact-match parity with the docs (~0.954 flexible-extract).
  • Expand to conc=128 once conc=1 is healthy.

🤖 Generated with Claude Code

Comment thread .github/configs/amd-master.yaml Outdated
# triton_unfused MoE backend consumes the FP4 expert weights from the
# DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; KV cache stays fp8.
dsv4-fp4-mi355x-vllm:
image: rocm/vllm-dev:deepseek-v4-latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oseltamivir can u add the specific SHA like we do for all the other images?

c898aeb#diff-1bd56add5a769a3c46eaffc11bbbccc4d8546cf7c1a0877b735dc46581ac8405R2637

Comment thread .github/configs/amd-master.yaml Outdated
# triton_unfused MoE backend consumes the FP4 expert weights from the
# DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; KV cache stays fp8.
dsv4-fp4-mi355x-vllm:
image: rocm/vllm-dev:deepseek-v4-latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new dsv4-fp4-mi355x-vllm entry uses image: rocm/vllm-dev:deepseek-v4-latest — the only -latest-style tag in any master config. Every other vLLM image is version-pinned (v0.16.0–v0.19.1), and the sibling dsv4-fp4-mi355x-sglang entry right above is SHA+date-pinned (rocm/sgl-dev:rocm720-mi35x-a8410de-20260502-DSv4). A mutable tag breaks the perf-changelog attribution model — upstream rebuilds will silently shift benchmark results without a corresponding config commit. Suggest pinning to a dated/SHA tag (e.g., rocm/vllm-dev:deepseek-v4-<sha-or-date>) mirroring the sibling sglang convention.

Extended reasoning...

What the bug is. The new dsv4-fp4-mi355x-vllm block at amd-master.yaml:1645 sets image: rocm/vllm-dev:deepseek-v4-latest. The trailing -latest makes this a mutable tag — the registry can re-publish the same tag pointing at a new image SHA at any time without any change to this repo.\n\nWhy this is out of pattern. Grepping every other vLLM image in amd-master.yaml shows pinned tags exclusively: vllm/vllm-openai-rocm:v0.16.0, v0.17.0, v0.18.0, v0.19.0, v0.19.1, plus rocm/atom:…atom0.1.2.post for dsv4-fp8-mi355x-vllm. More tellingly, the directly-adjacent dsv4-fp4-mi355x-sglang entry uses rocm/sgl-dev:rocm720-mi35x-a8410de-20260502-DSv4 and its YAML comment explicitly says: "the SHA is encoded in the image tag, so bumping sglang is just an image tag bump here." That is the convention this PR is breaking.\n\nWhy existing code does not prevent it. Nothing in the harness validates tag immutability — the YAML is fed to the sweep runner as-is. The perf-changelog.yaml mechanism (this PR adds an entry at lines 2210–2220) attributes performance changes to config-key bumps — i.e., to git commits that change the image tag. With :deepseek-v4-latest there is no future commit when the upstream image is rebuilt; the changelog records numbers against a tag whose contents can drift.\n\nImpact. The YAML comment claims "the image already ships the validated build" and the PR description plans a gsm8k parity check (~0.954 flexible-extract). Both claims are anchored to whatever image happens to live behind :deepseek-v4-latest at the moment of validation. After that, any upstream re-push silently invalidates both the recorded gsm8k number and the per-sweep latency numbers, with no commit/PR to attribute the regression (or improvement) to. Mid-sweep regressions become unattributable, which is precisely the failure mode the perf-changelog workflow was designed to prevent.\n\nStep-by-step proof.\n1. PR #1282 lands; :deepseek-v4-latest resolves to image SHA A. Sweep runs, perf-changelog records numbers N_A.\n2. Upstream rebuilds rocm/vllm-dev:deepseek-v4-latest to fix an unrelated CVE. Tag now resolves to SHA B. No commit lands in this repo.\n3. Next nightly sweep pulls SHA B, produces numbers N_B \neq N_A. Bisecting attribution against git log finds no corresponding config-key change — the perf delta is invisible to the changelog model.\n4. Contrast with the sibling dsv4-fp4-mi355x-sglang entry: bumping its image requires changing the a8410de-20260502 portion of the tag, which is a commit in this repo and a perf-changelog entry.\n\nHow to fix. Replace rocm/vllm-dev:deepseek-v4-latest with a SHA- or date-pinned tag that the recipe authors publish (e.g., rocm/vllm-dev:deepseek-v4-<sha> or …-<YYYYMMDD>), mirroring the sibling sglang convention. Future image bumps then arrive as explicit config-key edits, preserving perf-changelog attribution. Severity is normal — reproducibility/attribution is a stated goal of this directory (per AGENTS.md), and there is precedent immediately above for the right pattern.

Comment thread perf-changelog.yaml Outdated
- "FP4 path: same DeepSeek-V4-Pro FP4+FP8 mixed checkpoint; --moe-backend triton_unfused consumes the FP4 expert weights directly. KV cache stays fp8"
- "Recipe flags from the docs: VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1, --gpu-memory-utilization 0.6, --max-num-seqs 128 (auto-grows past CONC), --max-num-batched-tokens 8192, --distributed-executor-backend mp, --async-scheduling, --enforce-eager, --tokenizer-mode/reasoning-parser/tool-call-parser deepseek_v4"
- "Search space: TP=8, CONC 1-128 for both 1k1k and 8k1k"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD (line 2219), which will resolve to a 404 once merged. Every other entry in this file uses a real numeric PR id; please replace TBD with 1282 (this PR's number) before merging.

Extended reasoning...

The bug

In perf-changelog.yaml line 2219, the new entry for dsv4-fp4-mi355x-vllm is given a placeholder pr-link:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD

TBD is not a valid PR number, so this URL produces a 404 in any tooling or UI that follows it. Every other entry in the file (for example the immediately-preceding entries on lines 2189, 2199, 2209) uses a real numeric PR id like 1256, 1065, or 1222.

Why existing checks don't catch it

The PR description itself flags this as a known TODO ("pr-link is TBD — fill in when this number is known"), so the author plans to fix it manually before merge. However, there is no CI gate that enforces this: the changelog schema in utils/matrix_logic/validation.py only validates that pr-link is a string, not that it is a valid URL or that it resolves to an existing PR. AGENTS.md documents using XXX/TBD as draft placeholders, but with no automated check the placeholder can silently slip through and become permanent in the merged changelog.

Impact

Low blast radius — the changelog is metadata, so no benchmark execution is affected. The cost is purely a broken hyperlink baked into git history, which any tool that walks the changelog (release notes generation, UI dashboards, manual auditors clicking through entries) will hit as a 404.

Step-by-step proof

  1. Open perf-changelog.yaml at line 2219 — the value is the literal string https://github.com/SemiAnalysisAI/InferenceX/pull/TBD.
  2. Compare to the most recent prior entry at line 2209: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1222 — a valid numeric id.
  3. Resolving /pull/TBD against any GitHub repo returns HTTP 404 because GitHub PR routes accept only integers.
  4. Grep utils/matrix_logic/validation.py for the pr-link schema entry — the type check is str, with no regex/URL validator. So yaml.safe_load succeeds and validate succeeds even with TBD.
  5. Hence merging the PR as-is permanently encodes the broken link.

Fix

Replace TBD with 1282 (this PR's number) on line 2219:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1282

Optionally, a follow-up could harden validation.py to require pr-link to match https://github\.com/.+/pull/\d+$ so future placeholders fail CI.

Oseltamivir and others added 2 commits May 5, 2026 01:40
New benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh follows the
recipes.vllm.ai DeepSeek-V4-Pro AMD ROCm page (vllm-project/vllm#40871
base + #41217 MLA Indexer). Uses rocm/vllm-dev:deepseek-v4-latest
directly so no runtime PR clone or pip rebuild is required, unlike the
existing dsv4-fp8-mi355x-vllm overlay path.

FP4 path: same DeepSeek-V4-Pro FP4+FP8 mixed checkpoint;
--moe-backend triton_unfused consumes FP4 expert weights; KV cache fp8.
Recipe flags: VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_LINEAR=1,
--gpu-memory-utilization 0.6, --max-num-seqs 128 (auto-grows past CONC),
--max-num-batched-tokens 8192, --distributed-executor-backend mp,
--async-scheduling, --enforce-eager, deepseek_v4 tokenizer / reasoning /
tool-call parsers.

Adds:
- benchmarks/single_node/dsv4_fp4_mi355x_vllm.sh
- .github/configs/amd-master.yaml: dsv4-fp4-mi355x-vllm (TP=8, conc 1-128
  for 1k1k and 8k1k)
- perf-changelog.yaml entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rocm/vllm-dev:deepseek-v4-latest is a moving tag; pin to the digest we
intend to validate against so reruns are reproducible. Matches the
@sha256: format already used by the nvidia-master.yaml DSv4 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

triton_unfused is not in the allowed list for FP4/FP8 MoE
(ValueError: moe_backend='triton_unfused' is not supported).
Switch to --moe-backend aiter which handles MXFP4 expert weights
natively on ROCm gfx950 via the AITER stack already enabled by
VLLM_ROCM_USE_AITER=1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants