[Draft] Redmule platform#67
Draft
runwangdl wants to merge 9 commits into
Draft
Conversation
985a645 to
9ef9cc2
Compare
Minimal port of RedMulE-platform code from the user's redmule_platform
branch (which had accumulated unrelated CCT_Optim merges) onto a clean
devel base.
What landed:
- New target Deeploy/Targets/Redmule/ (Platform, Engine, Deployer,
Bindings, Parsers, Tiler, Templates, TileConstraints,
TopologyOptimizationPasses).
- FP32 RedMulE matmul kernel TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c
- Test runner DeeployTest/testRunner_tiled_siracusa_w_redmule.py plus
Float test fixtures (testFloat{Matmul,MatmulLarge,MatmulLarge256,2DConvolution,2dConvLarge,GEMM,GEMMtransB}).
- Wiring in platformMapping.py, top-level CMakeLists.txt,
DeeployTest/CMakeLists.txt, TargetLibraries/PULPOpen/CMakeLists.txt.
- Makefile: GVSOC_COMMIT_HASH points at runwangdl/gvsoc fork 35d00d1
(carries the light_redmule vendored copy + Siracusa cluster wiring).
Fixes / portings required for devel compatibility:
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py: define
float32_tPtr locally (unresolved import left on devel).
- Deeploy/Targets/Redmule/TopologyOptimizationPasses/Passes.py: switch
from the retired _permuteLastTwoDims / _appendTransposeNode helpers
to upstream's _appendTranspose.
- Add empty __init__.py to Targets/{Chimera,Redmule,SoftHier}.
What intentionally did NOT land:
- CCT_Optim-era edits to PULPOpen Templates (Add/Conv/GELU/Layernorm/
MatMul/MaxPool/Relu/Softmax), Generic Layers.py computeOps, CCT test
suites, parallel/unroll rewrites.
- Buggy -march=rv32imc inside meson-build-script-rv32imf.txt.
- Hard-to-merge edits to DeeployTest/Platforms/Siracusa/src/deeploytest.c.
- The old-style .github/workflows/TestRunnerTiledSiracusaWithRedmule.yml;
new-style ci-platform-siracusa-redmule-tiled.yml TBD.
Verified end-to-end: testFloatMatmul on GVSoC (runwangdl/gvsoc@35d00d1,
pulp submodule @ 371772c) passes with 'Errors: 0 out of 256'.
The Tests/ directory layout on devel was reorganized into Kernels/, Models/, Others/ subdirectories. Drop the flat-path Float test inputs ported from redmule_platform; they'll be re-added under the new structure in a follow-up.
d998fc3 to
67b754b
Compare
Mirrors the neureka-tiled pattern:
- DeeployTest/test_siracusa_redmule_tiled_config.py with empty
L2_{SINGLE,DOUBLE}BUFFER_KERNELS dicts (to be populated once Float
kernel test fixtures land under Tests/Kernels/Float/).
- conftest.py: register 'siracusa_redmule_tiled' pytest marker.
- test_platforms.py: two parametrized test functions (L2 single- and
double-buffer) for the redmule platform.
- .github/workflows/_runner-siracusa-redmule-tiled.yml: reusable runner
mirroring _runner-siracusa-neureka-tiled.yml.
- .github/workflows/ci-platform-siracusa-redmule-tiled.yml: top-level
trigger, defaults to ghcr.io/runwangdl/deeploy:redmule Docker image.
With empty configs the tests collect and skip cleanly (pytest 'got
empty parameter set'). No wmem variants since RedMulE does not use
Neureka weight memory.
- yapf / isort / autoflake / trailing-whitespace across the Redmule Python target and platformMapping wiring. - clang-format over TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c. - Add SPDX/license header to Matmul_fp32_Redmule.c (reuse hook).
The GAP9 CI uses ghcr.io/pulp-platform/deeploy-gap9:devel, which is only pullable with pulp-platform org credentials. On a fork the job fails at 'Initialize containers'. Add github.repository_owner guard so forks skip the jobs cleanly.
The docs workflow publishes to gh-pages, which on a fork races with external pushes and lacks origin remote setup. Gate on github.repository_owner == 'pulp-platform' so only upstream publishes.
Point the redmule tiled CI config at existing upstream FP32 kernel test fixtures under Tests/Kernels/FP32/GEMM (Regular, TransB). Both single-buffer and double-buffer variants verified locally end-to-end on GVSoC (Errors: 0 / 256, runtime ~4k cycles).
Without this fallback _select-env.yml resolves to the upstream pulp-platform/deeploy:devel image, which ships a GVSoC build that does not include the light_redmule model — the redmule test runner then hangs. Point the default at the fork's custom image so push events get the correct GVSoC build.
ghcr.io/runwangdl/deeploy:redmule is a private package; add credentials block using the workflow's GITHUB_TOKEN so the runner container step can pull it.
1 task
runwangdl
added a commit
to runwangdl/TrainDeeploy
that referenced
this pull request
May 10, 2026
The Siracusa+RedMulE training CI on 1782a88 got past Python codegen but failed at link time: ld.lld: error: undefined symbol: Conv2d_Im2Col_fp32_fp32_fp32_HWC_8_Redmule >>> referenced by TrainingNetwork.c:5386 in _node_1_tokenizer_..._Conv_cluster_fork The original RedMulE PR (pulp-platform/Deeploy#67) shipped only the matmul kernel TargetLibraries/PULPOpen/src/Matmul_fp32_Redmule.c. The ConvTemplate references a `Conv2d_Im2Col_..._8_Redmule` kernel that has no corresponding source in the tree, and 67b754b already deleted the testFloat2DConvolution / testFloat2dConvLarge fixtures that would have exercised the Redmule Conv path. So the Conv binding has always been load-bearing only for non-test models like CCT_train, and on those it breaks the link. Two coupled changes route Conv through the existing PULPClusterEngine (which has a working PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC): - Drop 'Conv' from RedmuleMapping. Without it Conv falls through to the second engine in RedmulePlatform's engine list (PULPCluster). - Drop RedMuleAdjustWeightMemoryLayoutPass from the lowering passes. That pass transposed Conv weights from [F,H,W,Cin] to [H,W,Cin,F] for the RedMulE accelerator's expected layout; once Conv is on the PULPCluster engine, PULP expects [F,H,W,Cin] and the pre-applied transpose makes Tiling produce out-of-bounds tile rectangles (locally repro'd: AssertionError "Rectangle offset should be zero when the dimensions are the same. Received rectangle HyperRectangle(offset=(3, 0, 0, 0), dims=(3, 3, 3, 32))" in TilingCodegen.minimizeRectangle). Both are clearly marked in-source as "restore when the RedMulE Conv kernel lands." Locally validated end-to-end: - testMVPTraining.py -> exit 0 (TrainingNetwork.c emits PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC for the tokenizer Conv). - testMVPOptimizer.py -> exit 0. Matmul / Gemm continue to bind to RedMulE as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runwangdl
added a commit
to runwangdl/TrainDeeploy
that referenced
this pull request
May 10, 2026
Implements the kernel symbol that Deeploy/Targets/Redmule/Templates/ConvTemplate.py has been pointing at since the original pulp-platform/Deeploy#67 port -- it was a declared- but-never-defined dangling reference, which is why 78a05d4 had to unmap Conv from RedmuleMapping and route it through PULPCluster. - TargetLibraries/PULPOpen/src/Conv2d_Im2Col_fp32_Redmule.c All 8 cluster cores cooperatively build the [N_out, P*Q*C] im2col matrix in the hoisted L1 transient buffer (contiguous slices of output positions, zero-pad when h_in/w_in fall outside the input). Core 0 then triggers a single RedMulE GEMM [N_out, K] @ [K, F] -> [N_out, F] via MatMul_*_Redmule / Gemm_*_Redmule from Matmul_fp32_Redmule.c. When has_bias is true the [F] bias is broadcast in-place into pOut and Gemm runs with y_addr = z_addr = pOut (same pattern the existing MatMul kernel already uses for its Y=Z=pDstY zero-init). - Conv.h declares the new symbol. - ConvTemplate.py: * forwards ${bias} and ${has_bias} (PULPFPConv2DParser already populates them) -- the previous template silently dropped bias. * sizes the im2col transient buffer to the full per-tile H_out * W_out * (C*P*Q) footprint instead of the prior 8-row scratch; one big GEMM amortises RedMulE's MMIO setup cost. - Engine.RedmuleMapping restores 'Conv': ConvLayer([Conv2DRedmuleMapper]). - Deployer.py restores RedMuleAdjustWeightMemoryLayoutPass -- it permutes Conv weights from [F,P,Q,C] to [P,Q,C,F] = flat [P*Q*C, F], exactly the right operand the im2col GEMM consumes. Both Conv and the layout pass were disabled together in 78a05d4 (PULPCluster fallback expects [F,P,Q,C]); both come back together now. Locally validated: testMVPTraining.py + testMVPOptimizer.py both exit 0 on Models/Training/CCT/cct_train @ Siracusa_w_redmule; generated TrainingNetwork.c now emits Conv2d_Im2Col_fp32_fp32_fp32_HWC_8_Redmule for the tokenizer Conv (was PULP_Conv2d_Im2Col_*_HWC). GVSoC numerical tolerance still has to be checked on CI -- this is a new kernel, not a wrapper around an existing one, and the broadcasted- bias path was never exercised before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Redmule Platform
(Rebased on Picolib imf PR and CCT optim PR)
Added
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.