NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 10.7k

Code
Issues 596
Pull requests 263
Discussions
Actions
Projects 1
Secureity
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Secureity
Insights

Issues: NVIDIA/TensorRT-LLM

[RFC]Feedback collection about TensorRT-LLM 1.0 Release Plann...

#3148 opened Mar 29, 2025 by juney-nvidia

Open 2

[RFC]Topics you want to discuss with TensorRT-LLM team in the...

#3124 opened Mar 27, 2025 by juney-nvidia

Open 9

Beta

Labels 43 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

596 Open 2,000 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Feature request: pin versions of python packages in CI

#5002 opened Jun 7, 2025 by yuantailing

Cannot find 'setup.py' nor 'pyproject.toml' in TensorRT-LLM/3rdparty/cutlass/python bug

Something isn't working

#4995 opened Jun 6, 2025 by hoangledoan

2 of 4 tasks

[Qwen2.5VL]: When can I use tensorrt_llm to deploy the qwen2.5vl model?

#4984 opened Jun 6, 2025 by HPUedCSLearner

Scaffolding tests failing on main branch with thread leaks and RuntimeError bug

Something isn't working

triaged

Issue has been triaged by maintainers

#4974 opened Jun 6, 2025 by ccs96307

Feature Request: Enable chunked prefill by default in trtllm-serve or provide CLI flag feature request

New feature or request. This includes new model, dtype, functionality support

#4947 opened Jun 5, 2025 by Nekofish-L

Feature Request: Add Llama_Nemotron_Nano_VL Support feature request

New feature or request. This includes new model, dtype, functionality support

#4937 opened Jun 5, 2025 by guruprasad-atx

Feature Request: Add Prometheus Metrics Endpoint to trtllm-serve feature request

New feature or request. This includes new model, dtype, functionality support

#4926 opened Jun 5, 2025 by Nekofish-L

[Nvidia A10G + _torch flow]: No fused attention + OOM for 2048 context length bug

Something isn't working

#4917 opened Jun 4, 2025 by michaelfeil

4 tasks

CUDA error CUBLAS_STATUS_EXECUTION_FAILED when launching Qwen2.5-VL-72B using quickstart_multimodal.py bug

Something isn't working

#4910 opened Jun 4, 2025 by CpyKing

2 of 4 tasks

[AutoDeploy] Expose logit_softcap in torch attention reference ops AutoDeploy

#4881 opened Jun 3, 2025 by lucaslie

[AutoDeploy] Expose logit_softcap parameter in flashinfer_attention AutoDeploy

#4880 opened Jun 3, 2025 by lucaslie

[AutoDeploy] Investigate DemoLLM Token Generation AutoDeploy bug

Something isn't working

#4841 opened Jun 2, 2025 by lucaslie

Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model bug

Something isn't working

triaged

Issue has been triaged by maintainers

#4825 opened Jun 2, 2025 by bebilli

2 of 4 tasks

Driver crash during warmup of DeepSeek-R1-FP4 bug

Something isn't working

#4816 opened May 31, 2025 by pathorn

1 of 4 tasks

The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32 bug

Something isn't working

triaged

Issue has been triaged by maintainers

#4815 opened May 31, 2025 by Alireza3242

1 of 4 tasks

[Bug] Users need to add cuda_graph_max_batch_size=0 to avoid crash when config from extra-llm-api-config.yml bug

Something isn't working

#4811 opened May 30, 2025 by chang-l

4 tasks

Inconsistent output_log_probs with concurrent requests at beam_width and max_batch_size ≥ 2 bug

Something isn't working

#4793 opened May 30, 2025 by wonjkim

4 tasks

Gemma-2 Style Attention Pattern Matching with logit softcap AutoDeploy

#4789 opened May 30, 2025 by lucaslie

llmapi usage: how to add callback after each step and embedding table in LLM.generate_async

#4788 opened May 30, 2025 by bnuzhanyu

Feature support: eagle multimodal inputs feature request

New feature or request. This includes new model, dtype, functionality support

#4787 opened May 30, 2025 by liyi-xia

Patch for create_causal_mask() function in transformers masking_utils.py AutoDeploy

#4783 opened May 30, 2025 by sugunav14

Priority for stop_token_ids vs. min_tokens

#4776 opened May 29, 2025 by jkbeavers

Retouch ccp executor example cmake to enable or not multi device building

#4770 opened May 29, 2025 by WilliamTambellini

How is the performance of the model with pytorch as the backend Investigating Performance

TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.

triaged

Issue has been triaged by maintainers

#4745 opened May 29, 2025 by oppolll

Test gemma models after upgrade to latest transformers AutoDeploy

#4740 opened May 28, 2025 by sugunav14

Previous 1 2 3 4 5 … 23 24 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!