[Feature]: MoE LoRA for quantized models #29424

Open

Labels

feature request

opened

on Nov 25, 2025

🚀 The feature, motivation and pitch

This issue tracks and records the LoRA support status for different quantization methods in vLLM.

Current Support Status

Quantization Method	LoRA Support	Test Model	Notes
MXFP4	✅	gpt-oss-20b
FP8	✅	qwen3-30b-a3b-fp8
GPTQ	❌	qwen3-30b-a3b-gptq-int4
AWQ	❌	qwen3-30b-a3b-awq
compressed_tensors	✅	qwen3-30b-a3b-w4a16
Bitsandbytes	❌	qwen3-30b-a3b-bnb	Offline Quantization
Bitsandbytes	❌	qwen3-30b-a3b	Online Quantization
GGUF	❌	qwen3-30b-a3b-gguf

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Assignees

No one assigned

Labels

feature request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests