-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: NVIDIA/TensorRT-LLM
[RFC]Feedback collection about TensorRT-LLM 1.0 Release Plann...
#3148
opened Mar 29, 2025 by
juney-nvidia
Open
2
[RFC]Topics you want to discuss with TensorRT-LLM team in the...
#3124
opened Mar 27, 2025 by
juney-nvidia
Open
9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Cannot find 'setup.py' nor 'pyproject.toml' in TensorRT-LLM/3rdparty/cutlass/python
bug
Something isn't working
#4995
opened Jun 6, 2025 by
hoangledoan
2 of 4 tasks
[Qwen2.5VL]: When can I use tensorrt_llm to deploy the qwen2.5vl model?
#4984
opened Jun 6, 2025 by
HPUedCSLearner
Scaffolding tests failing on main branch with thread leaks and RuntimeError
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4974
opened Jun 6, 2025 by
ccs96307
Feature Request: Enable chunked prefill by default in trtllm-serve or provide CLI flag
feature request
New feature or request. This includes new model, dtype, functionality support
#4947
opened Jun 5, 2025 by
Nekofish-L
Feature Request: Add Llama_Nemotron_Nano_VL Support
feature request
New feature or request. This includes new model, dtype, functionality support
#4937
opened Jun 5, 2025 by
guruprasad-atx
Feature Request: Add Prometheus Metrics Endpoint to trtllm-serve
feature request
New feature or request. This includes new model, dtype, functionality support
#4926
opened Jun 5, 2025 by
Nekofish-L
[Nvidia A10G + _torch flow]: No fused attention + OOM for 2048 context length
bug
Something isn't working
#4917
opened Jun 4, 2025 by
michaelfeil
4 tasks
CUDA error CUBLAS_STATUS_EXECUTION_FAILED when launching Qwen2.5-VL-72B using quickstart_multimodal.py
bug
Something isn't working
#4910
opened Jun 4, 2025 by
CpyKing
2 of 4 tasks
[AutoDeploy] Expose logit_softcap in torch attention reference ops
AutoDeploy
#4881
opened Jun 3, 2025 by
lucaslie
[AutoDeploy] Expose logit_softcap parameter in flashinfer_attention
AutoDeploy
#4880
opened Jun 3, 2025 by
lucaslie
[AutoDeploy] Investigate DemoLLM Token Generation
AutoDeploy
bug
Something isn't working
#4841
opened Jun 2, 2025 by
lucaslie
Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4825
opened Jun 2, 2025 by
bebilli
2 of 4 tasks
Driver crash during warmup of DeepSeek-R1-FP4
bug
Something isn't working
#4816
opened May 31, 2025 by
pathorn
1 of 4 tasks
The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4815
opened May 31, 2025 by
Alireza3242
1 of 4 tasks
[Bug] Users need to add Something isn't working
cuda_graph_max_batch_size=0
to avoid crash when config from extra-llm-api-config.yml
bug
#4811
opened May 30, 2025 by
chang-l
4 tasks
Inconsistent output_log_probs with concurrent requests at beam_width and max_batch_size ≥ 2
bug
Something isn't working
#4793
opened May 30, 2025 by
wonjkim
4 tasks
Gemma-2 Style Attention Pattern Matching with logit softcap
AutoDeploy
#4789
opened May 30, 2025 by
lucaslie
llmapi usage: how to add callback after each step and embedding table in LLM.generate_async
#4788
opened May 30, 2025 by
bnuzhanyu
Feature support: eagle multimodal inputs
feature request
New feature or request. This includes new model, dtype, functionality support
#4787
opened May 30, 2025 by
liyi-xia
Patch for create_causal_mask() function in transformers masking_utils.py
AutoDeploy
#4783
opened May 30, 2025 by
sugunav14
Retouch ccp executor example cmake to enable or not multi device building
#4770
opened May 29, 2025 by
WilliamTambellini
How is the performance of the model with pytorch as the backend
Investigating
Performance
TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.
triaged
Issue has been triaged by maintainers
#4745
opened May 29, 2025 by
oppolll
Test gemma models after upgrade to latest transformers
AutoDeploy
#4740
opened May 28, 2025 by
sugunav14
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.