Skip to content

Conversation

@Varshith-Yadav
Copy link

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

@Varshith-Yadav Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36
@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Nov 26, 2025

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

@Varshith-Yadav
Copy link
Author

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Support not transposed weight for data-aware weight compression methods

2 participants