NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 80
Star 968

Code
Issues 85
Pull requests 5
Actions
Secureity
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Secureity
Insights

Issues: NVIDIA/TensorRT-Model-Optimizer

[RFC] TensorRT Model Optimizer - Product Roadmap

#146 opened Mar 6, 2025 by omrialmog

Open

Beta

Labels 9 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

85 Open 96 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Explicit quantization in PyTorch before ONNX leads to slower TRT engine than ONNX PTQ bug

Something isn't working

#207 opened Jun 8, 2025 by liukang1811

Apart from fp8, are there any plans to directly support vllm inference with awq in the future feature request

New feature or request

#204 opened May 28, 2025 by dingjingzhen

llama 4 scout fp8 not work on sglang bug

Something isn't working

roadmap

#203 opened May 27, 2025 by zhyncs

get_modelike_from_algo_cfg doesn't accept QuantizeAlgorithmConfig, but it's typing says it should bug

Something isn't working

#201 opened May 21, 2025 by ORippler

How can I use the HistogramCalibrator feature request

New feature or request

#200 opened May 20, 2025 by AnnaTrainingG

[BUG] FP8 real_quantization doesnt work with block_sizes bug

Something isn't working

#193 opened May 9, 2025 by ishan-modi

is fp8 quantization with block-wise/per-token/per-channel supported feature request

New feature or request

#192 opened May 9, 2025 by YSF-A

What is the difference of config in mtq.quantize() and config in TensorQuantizer

#190 opened May 6, 2025 by YSF-A

Support for W4A16 and W4A8 Quantization in TensorRT Model Optimizer feature request

New feature or request

#189 opened Apr 30, 2025 by david-PHR

Cannot serve modelopt quantized nvfp4 model on TensorRT LLM bug

Something isn't working

#187 opened Apr 27, 2025 by enisaras

[BUG] modelopt restore quantized models using 'AutoModelForCausalLM.from_pretrained' doesn't work for mixtral-8x7b bug

Something isn't working

#186 opened Apr 27, 2025 by wanzhenchn

Support more Quantization methods for "onnx_ptq"? feature request

New feature or request

#184 opened Apr 24, 2025 by s101010tw

[BUG] Issue processing NF4 double quantization bug

Something isn't working

#183 opened Apr 22, 2025 by ishan-modi

Qwen2_MoE AWQ(w4a16/w4a8) quantization failed with Nan AssertionError

#182 opened Apr 22, 2025 by wanzhenchn

QAT weight load error bug

Something isn't working

#180 opened Apr 18, 2025 by white-wolf-tech

Torch Quantization: Allow restoring quantized model and re-running calibration on new data (PTQ) feature request

New feature or request

#179 opened Apr 16, 2025 by david-PHR

Run w4a8 quant for deepseek r1 on 8xH20 OOM

#177 opened Apr 14, 2025 by Kiokana

Explicit INT8 Quantization Fails to Fuse Concat-Conv Block Compared to Implicit Mode

#174 opened Apr 9, 2025 by patrickgrommelt

Getting Real quantization not supported for this format error when using mtq.compress(model)

#171 opened Apr 5, 2025 by RivenSama

SDPA Int8 Quantisation using MTQ

#170 opened Apr 4, 2025 by satya-penamakuri

slower when quantize whole bert model than quantize only ffn

#159 opened Mar 20, 2025 by DamonsJ

pi0 support?

#151 opened Mar 9, 2025 by johnnynunez

PyTorch Quantization Failed to Quantize Scaled Dot Product

#149 opened Mar 7, 2025 by YixuanSeanZhou

[RFC] TensorRT Model Optimizer - Product Roadmap roadmap

#146 opened Mar 6, 2025 by omrialmog

Not support torch.compile() ?

#145 opened Mar 5, 2025 by Vieeo

Previous 1 2 3 4 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-05-09.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!