Skip to content

Conversation

@Ryo-not-rio
Copy link
Contributor

@Ryo-not-rio Ryo-not-rio commented Nov 25, 2025

jit_sve_1x1 conv is slower than ACL for f32 so we move them down the impl list.

This will also fix the regressions in 1x1 convolutions on c8g machines seen here:
https://github.com/uxlfoundation/oneDNN/actions/runs/19658857462

No regressions were observed on c7g or c8g machines on 16 threads with this change.

c7g 16 threads speed changes:

problem oneDNN (base) time(ms) oneDNN (131e4b) time(ms) speedup (>1 is faster)
conv--mode=P --max-ms-per-prb=300 --conv ic256ih14oc1024oh14kh1ph0n"resnet50-v1.5:conv10"
0.452 0.198 $${\color{green}2.28\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic512ih7oc2048oh7kh1ph0n"resnet50-v1.5:conv14"
0.453 0.2 $${\color{green}2.26\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic72ih56oc80oh56kh1ph0n"generated-tails-conv:1"
0.147 0.0857 $${\color{green}1.72\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic256ih56iw56oc512oh28ow56kh1kw1sh2sw1ph0pw0n"generated-strided-conv:1"
1.58 0.841 $${\color{green}1.88\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic72ih56iw54oc80oh56ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:7"
0.0515 0.0459 $${\color{green}1.12\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic140ih28iw28oc128oh14ow28kh1kw1sh2sw1ph0pw0n"generated-strided-conv:9"
0.0591 0.0469 $${\color{green}1.26\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic193ih36iw54oc162oh36ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:10"
0.133 0.117 $${\color{green}1.14\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic542ih32iw48oc124oh8ow48kh1kw1sh4sw1ph0pw0n"generated-strided-conv:12"
0.213 0.138 $${\color{green}1.55\times}$$

c8g 16 threads speed changes:

problem oneDNN (base) time(ms) oneDNN (131e4b) time(ms) speedup (>1 is faster)
conv--mode=P --max-ms-per-prb=300 --conv ic256ih14oc1024oh14kh1ph0n"resnet50-v1.5:conv10"
0.452 0.198 $${\color{green}2.28\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic512ih7oc2048oh7kh1ph0n"resnet50-v1.5:conv14"
0.453 0.2 $${\color{green}2.26\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic72ih56oc80oh56kh1ph0n"generated-tails-conv:1"
0.147 0.0857 $${\color{green}1.72\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic256ih56iw56oc512oh28ow56kh1kw1sh2sw1ph0pw0n"generated-strided-conv:1"
1.58 0.841 $${\color{green}1.88\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic72ih56iw54oc80oh56ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:7"
0.0515 0.0459 $${\color{green}1.12\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic140ih28iw28oc128oh14ow28kh1kw1sh2sw1ph0pw0n"generated-strided-conv:9"
0.0591 0.0469 $${\color{green}1.26\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic193ih36iw54oc162oh36ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:10"
0.133 0.117 $${\color{green}1.14\times}$$
conv--mode=P --max-ms-per-prb=300 --conv ic542ih32iw48oc124oh8ow48kh1kw1sh4sw1ph0pw0n"generated-strided-conv:12"
0.213 0.138 $${\color{green}1.55\times}$$

@Ryo-not-rio Ryo-not-rio requested a review from a team as a code owner November 25, 2025 11:23
@jondea
Copy link
Contributor

jondea commented Nov 25, 2025

Is this is a temporary fix? brgemm1x1 outperforms ACL in some other cases, so long term do we want to put a more targeted fix in?

@Ryo-not-rio
Copy link
Contributor Author

I tried shapes_1x1 and I didn't see any significant speedups for f32 on g8, maybe I need to try bf16

@Ryo-not-rio Ryo-not-rio marked this pull request as draft November 26, 2025 12:31
jit_sve_1x1 conv is slower than ACL for f32 so we move them down the impl list.

This will also fix the regressions in 1x1 convolutions on c8g machines seen here:
https://github.com/uxlfoundation/oneDNN/actions/runs/19658857462
@Ryo-not-rio Ryo-not-rio changed the title cpu: aarch64: conv: fix 1x1 regression cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list Nov 26, 2025
@Ryo-not-rio Ryo-not-rio closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants