Enable dequant fp8 weights quantized per-channel with compressed-tensor method #621

mandy-li · 2025-11-24T04:55:02Z

This PR enables dequant fp8 weights quantized with compressed-tensor method channel-wise

github-actions · 2025-11-24T06:13:01Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
c309bb5245b6d05228c9d2f9c8f3e769c08d9194

yiliu30 · 2025-11-24T08:42:11Z

vllm_gaudi/ops/hpu_compressed_tensors.py

+
+    def get_dequant_weights_func(self, ) -> Optional[Callable[[torch.nn.Module], torch.Tensor]]:
+        return self.dequant_fp8_weight
+


It would be better to assign get_dequant_weights_func to the layer to stay consistent with the existing implementation, and no changes are required on the INC side.

vllm-gaudi/vllm_gaudi/extension/ops.py

Lines 787 to 789 in e18a075

else:

# For INC path, we attach the dequant func to the layer

layer.get_dequant_weights_func = types.MethodType(get_dequant_weights_func, layer)

github-actions · 2025-11-25T01:17:03Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
e924bbb4f4ac3258a71a18ac4c753c8056bc059f

mandy-li · 2025-11-25T01:40:34Z

@yiliu30 , address your comment by binding dequant function to linear layer after loading weight. Please review

yiliu30

LGTM

yiliu30 · 2025-11-25T01:48:03Z

@xuechendi Please be aware this change, thanks!

github-actions · 2025-11-26T19:14:24Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

xuechendi · 2025-12-01T17:07:49Z

@skavulya @lkk12014402 , please help to cross review since you're working on compressed-tensor

…or method Signed-off-by: mandy-li <[email protected]>

xuechendi · 2025-12-01T18:48:32Z

vllm_gaudi/ops/hpu_compressed_tensors.py


+        # bind dequant function to layer for per-channel quantization
+        if layer.scheme.strategy == QuantizationStrategy.CHANNEL:
+            hpu_ops.bind_dequant_func(layer)


If the PR is only for INC dynamic, should not bind the dequant for any per-channel here, right?
What is the scope for this PR?

For inc. I can check if QUANT_CONFIG env var is set or not if you think necessary

please add check, we can't hijack non inc path

seems the change will also impact #552
Please also did a check for dynamic scheme

No, this should apply to static quant as well.

If we change bind_dequant_func() to something like fp8_perchannel_linear_postprocess_weights to be consistent with fp8_block_linear_postprocess_weights which is not INC specific, do I still need to check if inc path?

xuechendi · 2025-12-01T18:49:19Z

vllm_gaudi/ops/hpu_compressed_tensors.py

+    def dequant_fp8_weight(self, layer: torch.nn.Module) -> torch.Tensor:
+        if layer.scheme.strategy == QuantizationStrategy.CHANNEL:  # weights were quantized per-channel
+            dequant_weight = layer.weight.to(layer.weight_scale.dtype) * layer.weight_scale.squeeze()
+            return dequant_weight.to(torch.bfloat16).t()


Is this works for Gaudi2? Will it gets nan since scale might out of range

Oh, I checked CI, seems Gaudi2 is not getting nan, this is quite unexpected.
@yiliu30, is there any recent changes fix the Gaudi2 scale issue? Or it is because "scale_method": "ACT_MAXABS_PCS_POW2_WEIGHT_MAXABS_PTS_POW2_HW", will keep range under 244?

Never mind, I realized this is handled at create_weights

xuechendi · 2025-12-01T18:50:43Z

@yiliu30 , please help to review, this PR is to enable INC dynamic for compressed_tensor, would like to know if meet your initial design

github-actions · 2025-12-01T19:49:02Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

skavulya · 2025-12-01T21:42:51Z

LGTM

xuechendi · 2025-12-01T22:11:32Z

vllm_gaudi/extension/ops.py

    return wrapper


+def bind_dequant_func(layer):


Suggest to follow the same name pattern to the rest: fp8_perchannel_linear_postprocess_weights

yiliu30 · 2025-12-02T00:57:48Z

@yiliu30 , please help to review, this PR is to enable INC dynamic for compressed_tensor, would like to know if meet your initial design

Yes, it’s aligned with what we did for block-wise scaling.

mandy-li requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, vivekgoe and xuechendi as code owners November 24, 2025 04:55

mandy-li force-pushed the main branch from 438e327 to 22567c8 Compare November 24, 2025 05:09

yiliu30 reviewed Nov 24, 2025

View reviewed changes

mandy-li force-pushed the main branch from 22567c8 to 8c441ad Compare November 25, 2025 00:17

yiliu30 approved these changes Nov 25, 2025

View reviewed changes

mandy-li force-pushed the main branch 2 times, most recently from 73d8ed6 to b91f94c Compare November 26, 2025 07:46

mandy-li requested review from kamil-kaczor and ksmusz as code owners November 26, 2025 07:46

linoybu approved these changes Nov 26, 2025

View reviewed changes

mandy-li force-pushed the main branch from b91f94c to 5a6a1f5 Compare November 26, 2025 18:10

Enable dequant fp8 weights quantized per-channel with compressed-tens…

cfbbd83

…or method Signed-off-by: mandy-li <[email protected]>

mandy-li force-pushed the main branch from 5a6a1f5 to cfbbd83 Compare December 1, 2025 18:41

xuechendi reviewed Dec 1, 2025

View reviewed changes

skavulya approved these changes Dec 1, 2025

View reviewed changes

xuechendi reviewed Dec 1, 2025

View reviewed changes


		def get_dequant_weights_func(self, ) -> Optional[Callable[[torch.nn.Module], torch.Tensor]]:
		return self.dequant_fp8_weight

	else:
	# For INC path, we attach the dequant func to the layer
	layer.get_dequant_weights_func = types.MethodType(get_dequant_weights_func, layer)

Enable dequant fp8 weights quantized per-channel with compressed-tensor method #621

Are you sure you want to change the base?

Enable dequant fp8 weights quantized per-channel with compressed-tensor method #621

Conversation

mandy-li commented Nov 24, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

✅ CI Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 25, 2025

✅ CI Passed

Uh oh!

mandy-li commented Nov 25, 2025

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

yiliu30 commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

✅ CI Passed

Uh oh!

xuechendi commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuechendi commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

✅ CI Passed

Uh oh!

skavulya commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiliu30 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants