[feat] Support XQA-based MLA on SM120 #4858

jinyangyuan-nvidia · 2025-06-03T07:59:06Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

jinyangyuan-nvidia · 2025-06-03T07:59:37Z

/bot run

tensorrt-cicd · 2025-06-03T08:05:43Z

PR_Github #7305 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T08:25:53Z

PR_Github #7305 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #5293 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-03T09:20:24Z

/bot run

tensorrt-cicd · 2025-06-03T09:26:24Z

PR_Github #7325 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T12:43:07Z

PR_Github #7325 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5309 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-03T13:06:59Z

/bot run

tensorrt-cicd · 2025-06-03T13:12:34Z

PR_Github #7351 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T14:51:28Z

PR_Github #7351 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5327 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-03T15:03:11Z

/bot run

tensorrt-cicd · 2025-06-03T15:19:15Z

PR_Github #7369 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T16:57:02Z

PR_Github #7369 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5344 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-04T00:56:30Z

/bot run

tensorrt-cicd · 2025-06-04T01:02:06Z

PR_Github #7406 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T01:17:27Z

PR_Github #7406 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #5372 completed with status: 'ABORTED'

jinyangyuan-nvidia · 2025-06-04T01:29:29Z

/bot run

tensorrt-cicd · 2025-06-04T01:41:07Z

PR_Github #7410 [ run ] triggered by Bot

jinyangyuan-nvidia · 2025-06-04T03:14:43Z

/bot run

tensorrt-cicd · 2025-06-04T03:20:15Z

PR_Github #7425 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T03:20:22Z

PR_Github #7410 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #5375 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-04T03:58:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-04T04:05:26Z

PR_Github #7437 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T04:05:27Z

PR_Github #7425 [ run ] completed with state ABORTED

jinyangyuan-nvidia · 2025-06-05T13:28:36Z

/bot run

tensorrt-cicd · 2025-06-05T13:28:59Z

PR_Github #7744 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-05T13:29:00Z

PR_Github #7693 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-05T13:29:31Z

PR_Github #7744 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 8bf4200

tensorrt-cicd · 2025-06-05T13:34:50Z

PR_Github #7746 [ run ] triggered by Bot

jinyangyuan-nvidia · 2025-06-05T14:24:20Z

/bot kill

jinyangyuan-nvidia · 2025-06-05T14:24:25Z

/bot run

tensorrt-cicd · 2025-06-05T14:30:06Z

PR_Github #7763 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-05T14:30:07Z

PR_Github #7746 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-05T14:30:34Z

PR_Github #7764 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T14:30:36Z

PR_Github #7763 [ kill ] completed with state ABORTED

tensorrt-cicd · 2025-06-05T17:17:04Z

PR_Github #7764 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5623 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-06-06T01:06:15Z

/bot run

tensorrt-cicd · 2025-06-06T01:12:00Z

PR_Github #7803 [ run ] triggered by Bot

cpp/tensorrt_llm/common/attentionOp.cpp

cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplCommon.h

...tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/decoderXQAImplJIT.cpp

.../kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/src/nvrtcWrapper.cpp

tensorrt-cicd · 2025-06-06T04:12:11Z

PR_Github #7803 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5651 completed with status: 'SUCCESS'

Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: jinyangyuan-nvidia <154768711+jinyangyuan-nvidia@users.noreply.github.com> Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

jinyangyuan-nvidia · 2025-06-06T07:48:44Z

/bot run

tensorrt-cicd · 2025-06-06T07:54:18Z

PR_Github #7878 [ run ] triggered by Bot

ming-wei

Thanks, approved.

tensorrt-cicd · 2025-06-06T14:32:33Z

PR_Github #7878 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5692 completed with status: 'SUCCESS'

jinyangyuan-nvidia assigned lowsfer, peaceh-nv and jinyangyuan-nvidia Jun 3, 2025

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from 806d5ac to 4e0a93b Compare June 3, 2025 09:20

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from 4e0a93b to f65d92b Compare June 3, 2025 13:04

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from f65d92b to 628fdf1 Compare June 3, 2025 15:03

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from 628fdf1 to 1c48c46 Compare June 4, 2025 03:14

jinyangyuan-nvidia requested a review from dongxuy04 June 5, 2025 13:28

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from 8bf4200 to d37147f Compare June 5, 2025 14:24

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from d37147f to ee27d9f Compare June 6, 2025 01:00

ming-wei reviewed Jun 6, 2025

View reviewed changes

ming-wei requested a review from lowsfer June 6, 2025 02:28

lowsfer and others added 2 commits June 6, 2025 15:48

Add some comments

c8445a2

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

jinyangyuan-nvidia force-pushed the dev/sm120_xqa_mla branch from d139811 to c8445a2 Compare June 6, 2025 07:48

ming-wei approved these changes Jun 6, 2025

View reviewed changes

jinyangyuan-nvidia enabled auto-merge (squash) June 6, 2025 09:29

jinyangyuan-nvidia merged commit 20d0649 into NVIDIA:main Jun 6, 2025
3 checks passed

jinyangyuan-nvidia deleted the dev/sm120_xqa_mla branch June 6, 2025 15:26

[feat] Support XQA-based MLA on SM120 #4858

[feat] Support XQA-based MLA on SM120 #4858

Conversation

jinyangyuan-nvidia commented Jun 3, 2025

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

jinyangyuan-nvidia commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 5, 2025

Uh oh!

jinyangyuan-nvidia commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!