Skip to content

[TRTLLM-13575][feat] Add eplb support for qwen3.5 #15543

Open
nv-guomingz wants to merge 1 commit into
NVIDIA:mainfrom
nv-guomingz:user/guomingz/enable_eplb_qwen3_5
Open

[TRTLLM-13575][feat] Add eplb support for qwen3.5 #15543
nv-guomingz wants to merge 1 commit into
NVIDIA:mainfrom
nv-guomingz:user/guomingz/enable_eplb_qwen3_5

Conversation

@nv-guomingz

@nv-guomingz nv-guomingz commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for the Qwen3.5 Mixture-of-Experts model variant to the load balancer.
    • Expanded NVFP4 MoE inference coverage with expert-parallel load balancing for Qwen3.5-397B-A17B.
  • Tests

    • Added static and online expert-parallel load balancer configuration test cases with multiple backend options.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a3f5300c-55c1-4a03-93cd-ad2929ced053

📥 Commits

Reviewing files that changed from the base of the PR and between 31d4301 and d911902.

📒 Files selected for processing (4)
  • tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_gpus.yml

📝 Walkthrough

Walkthrough

Adds 'Qwen3_5MoeForCausalLM' to moe_model_arch_list so the MoE load balancer recognizes this architecture. Introduces NVFP4 EPLB integration tests for Qwen3.5-397B-A17B covering static (layer_updates_per_iter=0) and online (layer_updates_per_iter=2) modes, and registers these tests in DGX B200 and GB200 CI matrices.

Changes

Qwen3.5-MoE EPLB support and NVFP4 integration tests

Layer / File(s) Summary
Arch registration and NVFP4 EPLB test implementation
tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py, tests/integration/defs/accuracy/test_llm_api_pytorch.py
Qwen3_5MoeForCausalLM is added to moe_model_arch_list. A shared _run_nvfp4_4gpus_eplb helper is added to TestQwen3_5_397B_A17B that configures KvCacheConfig, CudaGraphConfig, MoeConfig, and runs LLM with attention DP, asserts NVFP4 quantization, and evaluates GSM8K. test_nvfp4_4gpus_static_eplb reads config.json to build per-layer initial_global_assignments with layer_updates_per_iter=0; test_nvfp4_4gpus_online_eplb hardcodes num_experts=512 with layer_updates_per_iter=2.
CI test list registration
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_gb200_multi_gpus.yml
Static EPLB tests for CUTEDSL and TRTLLM backends are registered in the DGX B200 4-GPU pre-merge matrix; online EPLB tests for both backends are registered in the GB200 multi-GPU matrix.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. Only the template structure is present with no actual content in the required 'Description' and 'Test Coverage' sections. Fill in the Description section explaining what EPLB support entails and why it's being added. Provide Test Coverage section listing the specific test cases that validate the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the feature addition (EPLB support) for the Qwen3.5 model, directly corresponding to the main changes in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@nv-guomingz nv-guomingz force-pushed the user/guomingz/enable_eplb_qwen3_5 branch from d911902 to 3eece46 Compare June 23, 2026 11:22
@nv-guomingz

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55235 [ run ] triggered by Bot. Commit: 3eece46 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55235 [ run ] completed with state SUCCESS. Commit: 3eece46
/LLM/main/L0_MergeRequest_PR pipeline #44193 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants