Tune MiniMax MI355X vLLM scheduling thresholds#1276
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
17bc2cc to
c2b7d37
Compare
c2b7d37 to
98bc84c
Compare
98bc84c to
a9a3cef
Compare
|
/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm |
|
@jiacao-amd Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25346292897 |
chunfangamd
left a comment
There was a problem hiding this comment.
This is a data-driven tuning change, and the concept is quite promising.
Please fix the CI failure and rewrite the logic slightly to improve understanding. Thanks @jiacao-amd for the work!
chunfangamd
left a comment
There was a problem hiding this comment.
LGTM. Thanks @jiacao-amd!
|
/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm |
|
@chunfangamd Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25395615583 |
|
/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm |
|
@jiacao-amd Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25449459268 |
#1293) PR #1276 ("Tune MiniMax MI355X vLLM scheduling thresholds") landed without a perf-changelog entry — the prepared entry was dropped in commit 8d8b1e0 ("Remove MiniMax perf changelog entry") before merge, so the tuned recipe never re-ran on push-to-main and the dashboard still reflects the old launch policy. Re-add the entry so a sweep is triggered for the new policy and the change is documented chronologically. The entry references the original PR #1276, matching the convention used for prior changelog re-appends (e.g. #1269). Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Tune the MiniMax-M2.5 FP8 MI355X vLLM launch policy for better throughput and stability across the 1k/1k and 8k/1k sweep points.
block-size=32, shuffled KV cache disabled, async scheduling enabled.1k/1k TP8/EP8: keepblock-size=32and shuffled KV cache disabled, and disable async scheduling.1k/1knon-TP8/EP8: useblock-size=16with shuffled KV cache; disable async scheduling through c128.8k/1k TP8/EP8: keepblock-size=32, shuffled KV cache disabled, disable AITER MoE withVLLM_ROCM_USE_AITER_MOE=0, and disable async scheduling.8k/1knon-TP8/EP8: disable async scheduling through c64; use shuffled KV cache withblock-size=16at c64 and above.Throughput Comparison
Metric:
tput_per_gpuonly.Testing
bash -n benchmarks/single_node/minimaxm2.5_fp8_mi355x.shgit diff --checkresults_bmkartifacts from the validation and baseline runs above.