Skip to content

[Example] Add TileKernels Examples in Tilus#155

Open
WilliamZhang20 wants to merge 16 commits intoNVIDIA:mainfrom
WilliamZhang20:tilekernels-examples
Open

[Example] Add TileKernels Examples in Tilus#155
WilliamZhang20 wants to merge 16 commits intoNVIDIA:mainfrom
WilliamZhang20:tilekernels-examples

Conversation

@WilliamZhang20
Copy link
Copy Markdown
Contributor

@WilliamZhang20 WilliamZhang20 commented May 8, 2026

Kernel List:

  • swiglu_forward_and_per_token_cast_kernel
  • per_token_cast

WilliamZhang20 and others added 14 commits April 4, 2026 03:26
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Copy link
Copy Markdown
Member

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WilliamZhang20 , LGTM in general.

Comment on lines +76 to +86
@property
def tma_predicate(self):
# Inside single_thread() only one thread runs the TMA call, so the
# @pred predicate is the constant 1. At warp scope we still need to
# select a single lane, so use the elected leader-lane predicate to
# avoid an if-branch divergence.
from tilus.hidet.ir.dtypes import uint32 as _u32

if self.current_num_threads == 1:
return _u32(1)
return self.contexts.leader_lane_ctx.leader_lane
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is tma-specific, we might move to tma emiter. this class is better to only put the common utility.

@yaoyaoding
Copy link
Copy Markdown
Member

/ok to test e6c70ef

@WilliamZhang20 WilliamZhang20 changed the title Add TileKernels Examples in Tilus [Example] Add TileKernels Examples in Tilus May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants