Skip to content

refactor: enforce tilebuf-only pipeline#612

Open
Zhendong404 wants to merge 1 commit intohw-native-sys:mainfrom
Zhendong404:squash-tilebuf-planmem
Open

refactor: enforce tilebuf-only pipeline#612
Zhendong404 wants to merge 1 commit intohw-native-sys:mainfrom
Zhendong404:squash-tilebuf-planmem

Conversation

@Zhendong404
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the PTOAS compilation pipeline to a tilebuf-native model, replacing legacy memref dependencies with a unified tile_buf/view/ptr system. This involves introducing a semantic extraction layer and updating PlanMemory and InsertSync to consume these native types. Feedback identifies several critical issues: a unit mismatch between bits and bytes in address planning, incorrect stride calculations for column-major subviews in the sync translator, and the breakage of global workspace planning due to hardcoded local-memory logic. Additionally, the reviewer noted that scalar load/store operations are missing from liveness analysis and pointed out redundant logic for buffer alias extraction that should be unified.

Comment on lines 111 to +112
SmallVector<uint64_t> offsets = getAllocatedOffsets(
op, currentMemRefType, buffer2Offsets, fallbackNextOffset);
op, tileType, buffer2Offsets, fallbackNextOffset);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a unit mismatch between PlanMemory and AllocToPointerCast. PlanMemory calculates offsets in bits (using semantics.constBits), while AllocToPointerCast's fallback logic and alignment constants use bytes. If a buffer is planned, its offset in buffer2Offsets will be in bits, but it is used here as a byte address, leading to incorrect hardware addresses. Ensure that offsets are consistently handled in bytes or bits throughout the pipeline, or perform the necessary conversion here.

Comment on lines +66 to 71
SmallVector<int64_t, 4> strides(shape.size(), 1);
for (int i = static_cast<int>(shape.size()) - 2; i >= 0; --i) {
if (shape[i + 1] == ShapedType::kDynamic)
return {-1, -1};
strides[i] = strides[i + 1] * shape[i + 1];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The stride calculation for pto.subview assumes a row-major layout. If the source tile has a column-major layout (specified in its TileBufType config), the calculated totalOffset will be incorrect, leading to wrong hazard detection in sync insertion. The calculation should check the blayout attribute of the source tile type and adjust the stride accumulation accordingly.

Comment on lines +852 to 861
auto memorySpaceAttr = getPlanningBufferSpaceAttr(operand);
if (isLocalMemPlan() && isLocalBuffer(memorySpaceAttr)) {
if (!memorySpaceAttr.has_value())
llvm::report_fatal_error("local buffer must have memory space");
return GetBufferInfo(op, operand,
memorySpaceAttr.value().getAddressSpace());
return GetBufferInfo(op, operand, memorySpaceAttr.value().getAddressSpace(),
out);
}
llvm_unreachable("buffer must has BufferInfo !");
op->emitError("expects local tile buffer result for PlanMemory");
return failure();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

GenerateBufferInfo is currently hardcoded to only support local memory planning. If planMode is GLOBAL_WORKSPACE_PLAN, the function will emit an error and return failure for any buffer, effectively breaking global workspace planning. Additionally, getPlanningBufferSpaceAttr does not handle GM views, which are the primary targets for global workspace planning. The logic should be updated to handle non-local buffers when in global workspace planning mode and to correctly resolve the GM address space for views.

Comment on lines 435 to 437
} else if (auto tprintOp = dyn_cast<pto::TPrintOp>(op)) {
// TPrintOp only reads from buffer, similar to LoadOp
OpKillHandle(curOpInfo, live, op->getBlock());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

pto::LoadScalarOp and pto::StoreScalarOp are not handled in RecursionIR. Since they read/write buffers, they should trigger OpKillHandle or UpdateOpGenInfo to ensure accurate liveness analysis. Skipping them may lead to incorrect memory planning if they are the last users of a buffer.

    } else if (auto tprintOp = dyn_cast<pto::TPrintOp>(op)) {
      // TPrintOp only reads from buffer, similar to LoadOp
      OpKillHandle(curOpInfo, live, op->getBlock());
    } else if (auto loadScalarOp = dyn_cast<pto::LoadScalarOp>(op)) {
      OpKillHandle(curOpInfo, live, op->getBlock());
    } else if (auto storeScalarOp = dyn_cast<pto::StoreScalarOp>(op)) {
      UpdateStoreOpInfo(curOpInfo, storeScalarOp.getPtr(), live);

Comment on lines +32 to +46
// Returns (alias_result, source) for planning aliases, including tile views.
std::optional<std::pair<Value, Value>> getBufferAliasInfo(Operation *op) {
if (auto genericAlias = getOperationAliasInfo(op))
return genericAlias;

if (auto bindOp = dyn_cast<pto::BindTileOp>(op))
return std::make_pair(bindOp.getResult(), bindOp.getSource());
if (auto subviewOp = dyn_cast<pto::SubViewOp>(op))
return std::make_pair(subviewOp.getResult(), subviewOp.getSource());
if (auto bitcastOp = dyn_cast<pto::BitcastOp>(op))
return std::make_pair(bitcastOp.getResult(), bitcastOp.getSrc());
if (auto treshapeOp = dyn_cast<pto::TReshapeOp>(op))
return std::make_pair(treshapeOp.getResult(), treshapeOp.getSrc());
return std::nullopt;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

getBufferAliasInfo is redundant with getOperationAliasInfo defined in Utils.cpp. Both functions now handle the same set of PTO ops (BindTileOp, SubViewOp, BitcastOp, TReshapeOp). These should be unified to avoid duplicate logic and potential inconsistencies.

std::optional<std::pair<Value, Value>> getBufferAliasInfo(Operation *op) {
  return getOperationAliasInfo(op);
}

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 1, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: refactor: enforce tilebuf-only pipeline #612 refactor: enforce tilebuf-only pipeline
  • Author: Zhendong404
  • Base/Head: main / squash-tilebuf-planmem
  • Head SHA: c3e1f7cffb2d
  • Trigger: PR 有新提交
  • Generated At: 2026-05-01T14:39:45Z
  • Previous Head SHA: 5daaa8ef1ede
  • Status: completed

Summary

PR #612 introduces memref-compatibility regressions in the pipe pipeline, a silent address-0 aliasing bug in standalone EmitC lowering, and an unversioned C API type-helper break.

Findings

  1. P2 Valid memref-based pipe IR is still accepted but no longer lowers after the bridge removal tools/ptoas/ptoas.cpp:1102

The PR removes the legacy memref bridge from the default ptoas pipeline (createPTOViewToMemrefPass() is gone, and MemRef/Bufferization are no longer registered), but the public IR surface still admits memref-based pipe values (PTOPipeEntryType still includes AnyMemRef, and initialize_l2g2l_pipe.gm_addr is still unconstrained). The EmitC pipe lowerings still only accept already-converted Tile<...> / GlobalTensor<...> operands (getPipeDataTypeToken in lib/PTO/Transforms/PTOToEmitC.cpp). That means memref-based talloc/tpush/tpop/tfree or initialize_l2g2l_pipe inputs that still verify on origin/main now fail late in emit-pto-manual, which is a real compatibility regression for existing frontends and older IR producers.

  1. P2 Unplanned alloc_tile values are all rebound to address 0 in EmitC lowering lib/PTO/Transforms/PTOToEmitC.cpp:3861

PTOAllocTileToPointerCast now fabricates %c0_i64 whenever an alloc_tile reaches EmitC without an explicit addr. That makes every such tile alias the same base address, so running emit-pto-manual without a preceding PlanMemory step silently generates overlapping local buffers and wrong code instead of reporting a hard error. The new comment explicitly claims this is a resilience path for non-PlanMemory pipelines, but the implementation is not safe.

  1. P2 mlirPTOGMTypeGet silently changed semantics from GM memref creation to tensor_view creation lib/CAPI/Dialect/PTO.cpp:657

The exported C API symbol mlirPTOGMTypeGet keeps the same name/signature but now returns !pto.tensor_view instead of the previous GM memref helper. Downstream users will still compile and link, but they now build a completely different IR type from the same API call, and there is already a dedicated mlirPTOTensorViewTypeGet entry point for the new behavior. This is an unversioned API break that will surface as verifier failures or out-of-tree regressions for existing C/Python clients.

@Zhendong404 Zhendong404 marked this pull request as ready for review May 1, 2026 14:22
@Zhendong404 Zhendong404 force-pushed the squash-tilebuf-planmem branch from 5daaa8e to c3e1f7c Compare May 1, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants