[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Varshith-Yadav · 2025-11-26T10:36:22Z

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

…_b=False

ljaljushkin · 2025-11-26T14:26:17Z

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

ljaljushkin · 2025-11-26T14:49:21Z

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

ljaljushkin · 2025-11-28T15:10:16Z

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

Varshith-Yadav · 2025-11-28T20:09:32Z

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

[NNCF] Enable data-aware weight compression for MatMul with transpose…

13fcd44

…_b=False

Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Varshith-Yadav commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 •

edited

Loading

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Are you sure you want to change the base?

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Conversation

Varshith-Yadav commented Nov 26, 2025

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ljaljushkin commented Nov 26, 2025 •

edited

Loading