Guard large-head nonpad Attention MEA dispatch by Kevin-Li-2025 · Pull Request #29140 · microsoft/onnxruntime

Kevin-Li-2025 · 2026-06-17T18:22:35Z

Description

The ONNX Attention CUDA path currently allows Memory Efficient Attention for the nonpad_kv_seqlen external-cache path with large head sizes. That path uses the CUTLASS custom right-padding variant, which can exceed the dynamic shared-memory opt-in limit on smaller architectures for head_size > 256 and crash instead of falling back.

This keeps MEA available for the normal path, but makes nonpad_kv_seqlen != nullptr && head_size > 256 fall through to the unified unfused path, which already supports large head sizes.

Tests

python3 -m py_compile onnxruntime/test/python/transformers/test_onnx_attention/test_gqa.py
git diff --check

I also attempted the targeted pytest locally:

python3 -m pytest -q onnxruntime/test/python/transformers/test_onnx_attention/test_gqa.py -k large_head_nonpad_seqlen_falls_back_from_mea_fp16

but local collection is blocked by a missing parameterized package before reaching ORT/CUDA execution.

Signed-off-by: Kevin-Li-2025 <2242139@qq.com>

Kevin-Li-2025 · 2026-06-17T18:48:33Z

@microsoft-github-policy-service agree

Guard large-head nonpad Attention MEA dispatch

14faed4

Signed-off-by: Kevin-Li-2025 <2242139@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard large-head nonpad Attention MEA dispatch#29140

Guard large-head nonpad Attention MEA dispatch#29140
Kevin-Li-2025 wants to merge 1 commit into
microsoft:mainfrom
Kevin-Li-2025:kevin/guard-large-head-nonpad-mea

Kevin-Li-2025 commented Jun 17, 2026

Uh oh!

Kevin-Li-2025 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kevin-Li-2025 commented Jun 17, 2026

Description

Tests

Uh oh!

Kevin-Li-2025 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant