[Draft] Redmule platform by runwangdl · Pull Request #67 · pulp-platform/Deeploy

runwangdl · 2025-05-08T22:52:24Z

Redmule Platform
(Rebased on Picolib imf PR and CCT optim PR)

Added

Redmule Platform, Engine, Tiler, Deployer, Binding
Matmul with Redmule tileConstraint, template, kernel
Conv Im2col with Redmule tileConstraint, template, kernel
Pass for Conv im2col weighttranspose

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

Minimal port of RedMulE-platform code from the user's redmule_platform branch (which had accumulated unrelated CCT_Optim merges) onto a clean devel base. What landed: - New target Deeploy/Targets/Redmule/ (Platform, Engine, Deployer, Bindings, Parsers, Tiler, Templates, TileConstraints, TopologyOptimizationPasses). - FP32 RedMulE matmul kernel TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c - Test runner DeeployTest/testRunner_tiled_siracusa_w_redmule.py plus Float test fixtures (testFloat{Matmul,MatmulLarge,MatmulLarge256,2DConvolution,2dConvLarge,GEMM,GEMMtransB}). - Wiring in platformMapping.py, top-level CMakeLists.txt, DeeployTest/CMakeLists.txt, TargetLibraries/PULPOpen/CMakeLists.txt. - Makefile: GVSOC_COMMIT_HASH points at runwangdl/gvsoc fork 35d00d1 (carries the light_redmule vendored copy + Siracusa cluster wiring). Fixes / portings required for devel compatibility: - Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py: define float32_tPtr locally (unresolved import left on devel). - Deeploy/Targets/Redmule/TopologyOptimizationPasses/Passes.py: switch from the retired _permuteLastTwoDims / _appendTransposeNode helpers to upstream's _appendTranspose. - Add empty __init__.py to Targets/{Chimera,Redmule,SoftHier}. What intentionally did NOT land: - CCT_Optim-era edits to PULPOpen Templates (Add/Conv/GELU/Layernorm/ MatMul/MaxPool/Relu/Softmax), Generic Layers.py computeOps, CCT test suites, parallel/unroll rewrites. - Buggy -march=rv32imc inside meson-build-script-rv32imf.txt. - Hard-to-merge edits to DeeployTest/Platforms/Siracusa/src/deeploytest.c. - The old-style .github/workflows/TestRunnerTiledSiracusaWithRedmule.yml; new-style ci-platform-siracusa-redmule-tiled.yml TBD. Verified end-to-end: testFloatMatmul on GVSoC (runwangdl/gvsoc@35d00d1, pulp submodule @ 371772c) passes with 'Errors: 0 out of 256'.

The Tests/ directory layout on devel was reorganized into Kernels/, Models/, Others/ subdirectories. Drop the flat-path Float test inputs ported from redmule_platform; they'll be re-added under the new structure in a follow-up.

Mirrors the neureka-tiled pattern: - DeeployTest/test_siracusa_redmule_tiled_config.py with empty L2_{SINGLE,DOUBLE}BUFFER_KERNELS dicts (to be populated once Float kernel test fixtures land under Tests/Kernels/Float/). - conftest.py: register 'siracusa_redmule_tiled' pytest marker. - test_platforms.py: two parametrized test functions (L2 single- and double-buffer) for the redmule platform. - .github/workflows/_runner-siracusa-redmule-tiled.yml: reusable runner mirroring _runner-siracusa-neureka-tiled.yml. - .github/workflows/ci-platform-siracusa-redmule-tiled.yml: top-level trigger, defaults to ghcr.io/runwangdl/deeploy:redmule Docker image. With empty configs the tests collect and skip cleanly (pytest 'got empty parameter set'). No wmem variants since RedMulE does not use Neureka weight memory.

- yapf / isort / autoflake / trailing-whitespace across the Redmule Python target and platformMapping wiring. - clang-format over TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c. - Add SPDX/license header to Matmul_fp32_Redmule.c (reuse hook).

The GAP9 CI uses ghcr.io/pulp-platform/deeploy-gap9:devel, which is only pullable with pulp-platform org credentials. On a fork the job fails at 'Initialize containers'. Add github.repository_owner guard so forks skip the jobs cleanly.

The docs workflow publishes to gh-pages, which on a fork races with external pushes and lacks origin remote setup. Gate on github.repository_owner == 'pulp-platform' so only upstream publishes.

Point the redmule tiled CI config at existing upstream FP32 kernel test fixtures under Tests/Kernels/FP32/GEMM (Regular, TransB). Both single-buffer and double-buffer variants verified locally end-to-end on GVSoC (Errors: 0 / 256, runtime ~4k cycles).

Without this fallback _select-env.yml resolves to the upstream pulp-platform/deeploy:devel image, which ships a GVSoC build that does not include the light_redmule model — the redmule test runner then hangs. Point the default at the fork's custom image so push events get the correct GVSoC build.

ghcr.io/runwangdl/deeploy:redmule is a private package; add credentials block using the workflow's GITHUB_TOKEN so the runner container step can pull it.

The Siracusa+RedMulE training CI on 1782a88 got past Python codegen but failed at link time: ld.lld: error: undefined symbol: Conv2d_Im2Col_fp32_fp32_fp32_HWC_8_Redmule >>> referenced by TrainingNetwork.c:5386 in _node_1_tokenizer_..._Conv_cluster_fork The original RedMulE PR (pulp-platform/Deeploy#67) shipped only the matmul kernel TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c. The ConvTemplate references a `Conv2d_Im2Col_..._8_Redmule` kernel that has no corresponding source in the tree, and 67b754b already deleted the testFloat2DConvolution / testFloat2dConvLarge fixtures that would have exercised the Redmule Conv path. So the Conv binding has always been load-bearing only for non-test models like CCT_train, and on those it breaks the link. Two coupled changes route Conv through the existing PULPClusterEngine (which has a working PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC): - Drop 'Conv' from RedmuleMapping. Without it Conv falls through to the second engine in RedmulePlatform's engine list (PULPCluster). - Drop RedMuleAdjustWeightMemoryLayoutPass from the lowering passes. That pass transposed Conv weights from [F,H,W,Cin] to [H,W,Cin,F] for the RedMulE accelerator's expected layout; once Conv is on the PULPCluster engine, PULP expects [F,H,W,Cin] and the pre-applied transpose makes Tiling produce out-of-bounds tile rectangles (locally repro'd: AssertionError "Rectangle offset should be zero when the dimensions are the same. Received rectangle HyperRectangle(offset=(3, 0, 0, 0), dims=(3, 3, 3, 32))" in TilingCodegen.minimizeRectangle). Both are clearly marked in-source as "restore when the RedMulE Conv kernel lands." Locally validated end-to-end: - testMVPTraining.py -> exit 0 (TrainingNetwork.c emits PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC for the tokenizer Conv). - testMVPOptimizer.py -> exit 0. Matmul / Gemm continue to bind to RedMulE as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements the kernel symbol that Deeploy/Targets/Redmule/Templates/ConvTemplate.py has been pointing at since the original pulp-platform/Deeploy#67 port -- it was a declared- but-never-defined dangling reference, which is why 78a05d4 had to unmap Conv from RedmuleMapping and route it through PULPCluster. - TargetLibraries/PULPOpen/src/Conv2d_Im2Col_fp32_Redmule.c All 8 cluster cores cooperatively build the [N_out, P*Q*C] im2col matrix in the hoisted L1 transient buffer (contiguous slices of output positions, zero-pad when h_in/w_in fall outside the input). Core 0 then triggers a single RedMulE GEMM [N_out, K] @ [K, F] -> [N_out, F] via MatMul_*_Redmule / Gemm_*_Redmule from Matmul_fp32_Redmule.c. When has_bias is true the [F] bias is broadcast in-place into pOut and Gemm runs with y_addr = z_addr = pOut (same pattern the existing MatMul kernel already uses for its Y=Z=pDstY zero-init). - Conv.h declares the new symbol. - ConvTemplate.py: * forwards ${bias} and ${has_bias} (PULPFPConv2DParser already populates them) -- the previous template silently dropped bias. * sizes the im2col transient buffer to the full per-tile H_out * W_out * (C*P*Q) footprint instead of the prior 8-row scratch; one big GEMM amortises RedMulE's MMIO setup cost. - Engine.RedmuleMapping restores 'Conv': ConvLayer([Conv2DRedmuleMapper]). - Deployer.py restores RedMuleAdjustWeightMemoryLayoutPass -- it permutes Conv weights from [F,P,Q,C] to [P,Q,C,F] = flat [P*Q*C, F], exactly the right operand the im2col GEMM consumes. Both Conv and the layout pass were disabled together in 78a05d4 (PULPCluster fallback expects [F,P,Q,C]); both come back together now. Locally validated: testMVPTraining.py + testMVPOptimizer.py both exit 0 on Models/Training/CCT/cct_train @ Siracusa_w_redmule; generated TrainingNetwork.c now emits Conv2d_Im2Col_fp32_fp32_fp32_HWC_8_Redmule for the tokenizer Conv (was PULP_Conv2d_Im2Col_*_HWC). GVSoC numerical tolerance still has to be checked on CI -- this is a new kernel, not a wrapper around an existing one, and the broadcasted- bias path was never exercised before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

runwangdl force-pushed the redmule_platform branch 3 times, most recently from 985a645 to 9ef9cc2 Compare May 9, 2025 14:05

Victor-Jung assigned runwangdl May 18, 2025

Victor-Jung added Feature Addition of new features Milestone labels May 18, 2025

Victor-Jung added this to Deeploy May 22, 2025

Victor-Jung removed this from Deeploy May 22, 2025

Xeratec added this to Deeploy May 22, 2025

Xeratec added this to the Release 0.2.0 milestone May 22, 2025

Xeratec removed the Milestone label May 22, 2025

Victor-Jung moved this to In review in Deeploy May 22, 2025

Victor-Jung modified the milestones: Release 0.2.0, Release xxx Jun 17, 2025

Victor-Jung moved this from In review to In progress in Deeploy Jun 19, 2025

Xeratec modified the milestones: Release xxx, Release 0.3.0 Nov 19, 2025

runwangdl added 2 commits April 13, 2026 22:53

drop Float test fixtures

67b754b

The Tests/ directory layout on devel was reorganized into Kernels/, Models/, Others/ subdirectories. Drop the flat-path Float test inputs ported from redmule_platform; they'll be re-added under the new structure in a follow-up.

runwangdl force-pushed the redmule_platform branch from d998fc3 to 67b754b Compare April 13, 2026 22:56

runwangdl added 7 commits April 14, 2026 08:24

ci: skip gh-pages publish on forks

bb8189e

The docs workflow publishes to gh-pages, which on a fork races with external pushes and lacks origin remote setup. Gate on github.repository_owner == 'pulp-platform' so only upstream publishes.

ci(redmule): authenticate container pulls for private ghcr.io image

7e716f3

ghcr.io/runwangdl/deeploy:redmule is a private package; add credentials block using the workflow's GITHUB_TOKEN so the runner container step can pull it.

runwangdl mentioned this pull request May 10, 2026

feat(platform): port Siracusa+RedMulE from pulp-platform/Deeploy#67 runwangdl/TrainDeeploy#20

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Redmule platform#67

[Draft] Redmule platform#67
runwangdl wants to merge 9 commits into
pulp-platform:develfrom
runwangdl:redmule_platform

runwangdl commented May 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

runwangdl commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

PR Merge Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

runwangdl commented May 8, 2025 •

edited

Loading