Normalize TPUSH/TPOP usage and add interim onboard lane-offset bridge#1265
Normalize TPUSH/TPOP usage and add interim onboard lane-offset bridge#1265yanghaoran29 wants to merge 1 commit into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughDocumentation is added/updated to recommend direct TPUSH/TPOP/TFREE usage and describe a get_subblockid() lane-offset workaround. A new tpipe_lane_byte_offset helper is added to two intrinsic headers and applied in kernel_bgemm.cpp and the spmd_paged_attention test kernel, which also drops manual pipe recording/status control and re-enables a previously skipped test. ChangesTPUSH/TPOP guidance and lane-offset helper
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request establishes recommended guidelines for using GM FIFO tile-pipes (TPUSH/TPOP/TFREE) and refactors several kernels and runtime files to align with these patterns, including removing manual record/back-pressure calls and introducing the tpipe_lane_byte_offset helper. The review feedback highlights several opportunities to prevent potential out-of-bounds memory accesses and signed integer overflows by validating that retrieved sub-block IDs and size parameters are non-negative before using them in offset calculations or casting them to unsigned types.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/a5/tensormap_and_ringbuffer/bgemm/kernels/mix/kernel_bgemm.cpp (1)
164-187: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAdd a lane offset to the vector pipe before
TPOPget_sub_block_id(args)fixes the GM row selection, butTPOP<..., TileSplitAxis::TILE_UP_DOWN>still depends on the pipe’s sub-block ID. On MIX,get_subblockid()stays 0 for both AIV lanes, so lane 1 would read the same FIFO half as lane 0; this path needs the samemPipe.cons.setEntryOffset(tpipe_lane_byte_offset(args, VEC_M, N, sizeof(float)))treatment used elsewhere.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/a5/tensormap_and_ringbuffer/bgemm/kernels/mix/kernel_bgemm.cpp` around lines 164 - 187, The vector FIFO pop path in kernel_bgemm.cpp still uses the default pipe lane mapping, so both MIX lanes can read the same half of the FIFO even though c_sub is correctly offset by get_sub_block_id(args). Update the vec pipe setup around the TPOP call in the kernel_bgemm flow to apply the same lane-specific entry offset used elsewhere, using mPipe.cons.setEntryOffset(tpipe_lane_byte_offset(args, VEC_M, N, sizeof(float))) before TPOP, and keep the fix scoped to the VecFifoTileT / TileSplitAxis::TILE_UP_DOWN path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@examples/a5/tensormap_and_ringbuffer/bgemm/kernels/mix/kernel_bgemm.cpp`:
- Around line 164-187: The vector FIFO pop path in kernel_bgemm.cpp still uses
the default pipe lane mapping, so both MIX lanes can read the same half of the
FIFO even though c_sub is correctly offset by get_sub_block_id(args). Update the
vec pipe setup around the TPOP call in the kernel_bgemm flow to apply the same
lane-specific entry offset used elsewhere, using
mPipe.cons.setEntryOffset(tpipe_lane_byte_offset(args, VEC_M, N, sizeof(float)))
before TPOP, and keep the fix scoped to the VecFifoTileT /
TileSplitAxis::TILE_UP_DOWN path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 302db7f3-b8ba-40f6-b6d3-0b7a677f0b82
📒 Files selected for processing (9)
CLAUDE.mddocs/aicore-kernel-programming.mddocs/tpush-tpop.mdexamples/a2a3/tensormap_and_ringbuffer/qwen3_14b_decode/README.mdexamples/a5/tensormap_and_ringbuffer/bgemm/kernels/mix/kernel_bgemm.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/common/intrinsic.hsrc/a5/runtime/tensormap_and_ringbuffer/common/intrinsic.htests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/kernels/mix/paged_attention_parallel.cpptests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py
💤 Files with no reviewable changes (1)
- tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py
667cd6c to
c8525ca
Compare
Converge MIX GM FIFO tile-pipe kernels to pto-isa default semantics: - Remove redundant manual prod.record() and setRecordStatus(false) / setAllocateStatus(false) / setFreeStatus(false) from spmd_paged_attention - Rely on built-in TPUSH/TPOP/TFREE record and back-pressure; keep existing set_flag/wait_flag pipeline sync unchanged - Fix bgemm AIV GM accumulator row addressing via get_sub_block_id(args) instead of stale CCE get_subblockid() Work around a known pinned pto-isa bug on 1C2V MIX where get_subblockid() returns 0 for both AIV lanes, breaking TILE_UP_DOWN lane offsets inside the library. Until pto-isa/dispatch fixes sub-block identity, inject GlobalContext.sub_block_id through intrinsic.h tpipe_lane_byte_offset() + setEntryOffset as an interim bridge. Docs and validation: - Add docs/tpush-tpop.md; update aicore-kernel-programming.md - Drop spmd_paged_attention pytest skip - Onboard SmallCase1 / Case2 / Case1 golden checks pass
Converge MIX GM FIFO tile-pipe kernels to pto-isa default semantics:
Work around a known pinned pto-isa bug on 1C2V MIX where get_subblockid() returns 0 for both AIV lanes, breaking TILE_UP_DOWN lane offsets inside the library. Until pto-isa/dispatch fixes sub-block identity, inject GlobalContext.sub_block_id through intrinsic.h tpipe_lane_byte_offset() + setEntryOffset as an interim bridge.
Docs and validation: