[TRTLLM-13575][feat] Add eplb support for qwen3.5 by nv-guomingz · Pull Request #15543 · NVIDIA/TensorRT-LLM

nv-guomingz · 2026-06-23T11:18:41Z

Summary by CodeRabbit

Release Notes

New Features
- Added support for the Qwen3.5 Mixture-of-Experts model variant to the load balancer.
- Expanded NVFP4 MoE inference coverage with expert-parallel load balancing for Qwen3.5-397B-A17B.
Tests
- Added static and online expert-parallel load balancer configuration test cases with multiple backend options.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

coderabbitai · 2026-06-23T11:22:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a3f5300c-55c1-4a03-93cd-ad2929ced053

📥 Commits

Reviewing files that changed from the base of the PR and between 31d4301 and d911902.

📒 Files selected for processing (4)

tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/test_lists/test-db/l0_gb200_multi_gpus.yml

📝 Walkthrough

Walkthrough

Adds 'Qwen3_5MoeForCausalLM' to moe_model_arch_list so the MoE load balancer recognizes this architecture. Introduces NVFP4 EPLB integration tests for Qwen3.5-397B-A17B covering static (layer_updates_per_iter=0) and online (layer_updates_per_iter=2) modes, and registers these tests in DGX B200 and GB200 CI matrices.

Changes

Qwen3.5-MoE EPLB support and NVFP4 integration tests

Layer / File(s)	Summary
Arch registration and NVFP4 EPLB test implementation `tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py`, `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	`Qwen3_5MoeForCausalLM` is added to `moe_model_arch_list`. A shared `_run_nvfp4_4gpus_eplb` helper is added to `TestQwen3_5_397B_A17B` that configures `KvCacheConfig`, `CudaGraphConfig`, `MoeConfig`, and runs `LLM` with attention DP, asserts NVFP4 quantization, and evaluates GSM8K. `test_nvfp4_4gpus_static_eplb` reads `config.json` to build per-layer `initial_global_assignments` with `layer_updates_per_iter=0`; `test_nvfp4_4gpus_online_eplb` hardcodes `num_experts=512` with `layer_updates_per_iter=2`.
CI test list registration `tests/integration/test_lists/test-db/l0_dgx_b200.yml`, `tests/integration/test_lists/test-db/l0_gb200_multi_gpus.yml`	Static EPLB tests for CUTEDSL and TRTLLM backends are registered in the DGX B200 4-GPU pre-merge matrix; online EPLB tests for both backends are registered in the GB200 multi-GPU matrix.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. Only the template structure is present with no actual content in the required 'Description' and 'Test Coverage' sections.	Fill in the Description section explaining what EPLB support entails and why it's being added. Provide Test Coverage section listing the specific test cases that validate the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the feature addition (EPLB support) for the Qwen3.5 model, directly corresponding to the main changes in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

nv-guomingz · 2026-06-23T11:23:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-23T11:29:46Z

PR_Github #55235 [ run ] triggered by Bot. Commit: 3eece46 Link to invocation

tensorrt-cicd · 2026-06-23T22:31:48Z

PR_Github #55235 [ run ] completed with state SUCCESS. Commit: 3eece46
/LLM/main/L0_MergeRequest_PR pipeline #44193 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

nv-guomingz requested review from a team as code owners June 23, 2026 11:18

nv-guomingz requested a review from leslie-fang25 June 23, 2026 11:18

github-actions Bot assigned nv-guomingz Jun 23, 2026

[TRTLLM-13575][feat] Add eplb support for qwen3.5

3eece46

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz force-pushed the user/guomingz/enable_eplb_qwen3_5 branch from d911902 to 3eece46 Compare June 23, 2026 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-13575][feat] Add eplb support for qwen3.5 #15543

[TRTLLM-13575][feat] Add eplb support for qwen3.5 #15543
nv-guomingz wants to merge 1 commit into
NVIDIA:mainfrom
nv-guomingz:user/guomingz/enable_eplb_qwen3_5

nv-guomingz commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

nv-guomingz commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nv-guomingz commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jun 23, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

nv-guomingz commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nv-guomingz commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading