-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[None][fix] Pass dtype to AllReduce ctor to enable MNNVL all-reduce fo…
#15547
opened Jun 23, 2026 by
nv-guomingz
Collaborator
Loading…
[https://nvbugs/6293536][fix] Stage KV block offsets through a fresh host buffer
#15546
opened Jun 23, 2026 by
thorjohnsen
Collaborator
Loading…
1 task done
[None][Fix] Fix passing scaled timestep to time_embedder in Cosmos3
#15545
opened Jun 23, 2026 by
bastefaniak
Loading…
1 task
[None][test] Refine Qwen3.5 397B test cases
#15544
opened Jun 23, 2026 by
nv-guomingz
Collaborator
Loading…
1 task done
[TRTLLM-13575][feat] Add eplb support for qwen3.5
#15543
opened Jun 23, 2026 by
nv-guomingz
Collaborator
Loading…
1 task done
[TRTLLM-13212][refactor] Unify sampling stacks into single facade with leaf backend modules
#15542
opened Jun 23, 2026 by
zhaoyangwang-nvidia
Collaborator
•
Draft
1 task done
[None][test] Add modularized perf tests (attention + MoE discrete/continuous)
#15541
opened Jun 23, 2026 by
ruodil
Collaborator
Loading…
1 task done
[None][fix] Allow fail-early when reuse block and legacy mamba cache
#15540
opened Jun 23, 2026 by
Wanli-Jiang
Collaborator
Loading…
1 task done
[https://nvbugs/6344108][fix] skip TestNemotron3Super120B on pre-blackwell
#15539
opened Jun 23, 2026 by
bo-nv
Collaborator
Loading…
1 task
[https://nvbugs/6166097][fix] Fix CuteDSL NVFP4 EPLB weight layout
#15538
opened Jun 23, 2026 by
nv-xtf
Loading…
1 task done
[TRTLLM-13580][test] Add model-derived PyTorch attention backend test suite
#15536
opened Jun 23, 2026 by
yuxianq
Collaborator
Loading…
[https://nvbugs/6150288][fix] Use persistent per-stream workspace in cublas_mm for CUDA-graph safety
#15534
opened Jun 23, 2026 by
pamelap-nvidia
Collaborator
Loading…
2 of 4 tasks
[None][chore] Clean deprecated CppMambaCacheManager
#15533
opened Jun 23, 2026 by
bo-nv
Collaborator
Loading…
1 task done
[None][feat] Qwen-Image: NVFP4 SVDQuant (NVFP4 residual + rank-r BF16 LoRA)
#15532
opened Jun 23, 2026 by
jingyu-ml
Loading…
[#14874][feat] AutoDeploy : Perf optimization for gpt-oss-120b for low conc
AutoDeploy
<NV> AutoDeploy Backend
#15531
opened Jun 23, 2026 by
taylor-yb-lee
Collaborator
Loading…
1 task done
[None][chore] Autodeploy disable the pipeline cache by default
#15530
opened Jun 22, 2026 by
nvchenghaoz
Collaborator
Loading…
1 task
[None][CI] Waive flaky test_vbench_dimension_score_wan (nvbugs/6357628)
#15529
opened Jun 22, 2026 by
chang-l
Collaborator
Loading…
[https://nvbugs/6276842][test] Loosen rtol/atol on encoder CUDA graph logits parity check
#15527
opened Jun 22, 2026 by
tingyangk
Collaborator
Loading…
1 task done
[None][feat] Add prefix-aware scheduling config flag to support opt-out
#15526
opened Jun 22, 2026 by
SimengLiu-nv
Collaborator
Loading…
1 task done
[TRTLLM-13543][feat] WideEP FT: add EPLB mask-only reconfigure (1b.1)
#15525
opened Jun 22, 2026 by
chienchunhung
Collaborator
Loading…
[TRTLLM-12557][feat] WideEP FT: add AlltoAll watchdog (1a.4)
#15524
opened Jun 22, 2026 by
chienchunhung
Collaborator
Loading…
[None][fix] Preserve Kimi 2.5 tool call IDs
#15523
opened Jun 22, 2026 by
hvagadia
Contributor
Loading…
[#14882][fix] Make kv_cache_aware router robust to a missing KV-event stream
#15522
opened Jun 22, 2026 by
GodlyDonuts
Loading…
[doc] Clarify dtype='auto' resolution for LLM and KvCacheConfig
#15520
opened Jun 22, 2026 by
ojas4414
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-06-20.