cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list #4380

Ryo-not-rio · 2025-11-25T11:23:35Z

jit_sve_1x1 conv is slower than ACL for f32 so we move them down the impl list.

This will also fix the regressions in 1x1 convolutions on c8g machines seen here:
https://github.com/uxlfoundation/oneDNN/actions/runs/19658857462

No regressions were observed on c7g or c8g machines on 16 threads with this change.

c7g 16 threads speed changes:

problem	oneDNN (base) time(ms)	oneDNN (131e4b) time(ms)	speedup (>1 is faster)
conv --mode=P --max-ms-per-prb=300 --conv ic256ih14oc1024oh14kh1ph0n"resnet50-v1.5:conv10"	0.452	0.198	$${\color{green}2.28\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic512ih7oc2048oh7kh1ph0n"resnet50-v1.5:conv14"	0.453	0.2	$${\color{green}2.26\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic72ih56oc80oh56kh1ph0n"generated-tails-conv:1"	0.147	0.0857	$${\color{green}1.72\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic256ih56iw56oc512oh28ow56kh1kw1sh2sw1ph0pw0n"generated-strided-conv:1"	1.58	0.841	$${\color{green}1.88\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic72ih56iw54oc80oh56ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:7"	0.0515	0.0459	$${\color{green}1.12\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic140ih28iw28oc128oh14ow28kh1kw1sh2sw1ph0pw0n"generated-strided-conv:9"	0.0591	0.0469	$${\color{green}1.26\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic193ih36iw54oc162oh36ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:10"	0.133	0.117	$${\color{green}1.14\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic542ih32iw48oc124oh8ow48kh1kw1sh4sw1ph0pw0n"generated-strided-conv:12"	0.213	0.138	$${\color{green}1.55\times}$$

c8g 16 threads speed changes:

problem	oneDNN (base) time(ms)	oneDNN (131e4b) time(ms)	speedup (>1 is faster)
conv --mode=P --max-ms-per-prb=300 --conv ic256ih14oc1024oh14kh1ph0n"resnet50-v1.5:conv10"	0.452	0.198	$${\color{green}2.28\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic512ih7oc2048oh7kh1ph0n"resnet50-v1.5:conv14"	0.453	0.2	$${\color{green}2.26\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic72ih56oc80oh56kh1ph0n"generated-tails-conv:1"	0.147	0.0857	$${\color{green}1.72\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic256ih56iw56oc512oh28ow56kh1kw1sh2sw1ph0pw0n"generated-strided-conv:1"	1.58	0.841	$${\color{green}1.88\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic72ih56iw54oc80oh56ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:7"	0.0515	0.0459	$${\color{green}1.12\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic140ih28iw28oc128oh14ow28kh1kw1sh2sw1ph0pw0n"generated-strided-conv:9"	0.0591	0.0469	$${\color{green}1.26\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic193ih36iw54oc162oh36ow18kh1kw1sh1sw3ph0pw0n"generated-strided-conv:10"	0.133	0.117	$${\color{green}1.14\times}$$
conv --mode=P --max-ms-per-prb=300 --conv ic542ih32iw48oc124oh8ow48kh1kw1sh4sw1ph0pw0n"generated-strided-conv:12"	0.213	0.138	$${\color{green}1.55\times}$$

jondea · 2025-11-25T12:07:03Z

Is this is a temporary fix? brgemm1x1 outperforms ACL in some other cases, so long term do we want to put a more targeted fix in?

Ryo-not-rio · 2025-11-25T14:33:51Z

I tried shapes_1x1 and I didn't see any significant speedups for f32 on g8, maybe I need to try bf16

jit_sve_1x1 conv is slower than ACL for f32 so we move them down the impl list. This will also fix the regressions in 1x1 convolutions on c8g machines seen here: https://github.com/uxlfoundation/oneDNN/actions/runs/19658857462

Ryo-not-rio requested a review from a team as a code owner November 25, 2025 11:23

github-actions bot added the component:common label Nov 25, 2025

Ryo-not-rio marked this pull request as draft November 26, 2025 12:31

Ryo-not-rio force-pushed the ryo-not-rio/fix-brg branch from 131e4bd to c723cad Compare November 26, 2025 13:26

Ryo-not-rio changed the title ~~cpu: aarch64: conv: fix 1x1 regression~~ cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list Nov 26, 2025

Ryo-not-rio closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list #4380

cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list #4380

Uh oh!

Ryo-not-rio commented Nov 25, 2025 •

edited

Loading

Uh oh!

jondea commented Nov 25, 2025

Uh oh!

Ryo-not-rio commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list #4380

cpu: aarch64: conv: move f32 jit_sve_1x1 convolution down the list #4380

Uh oh!

Conversation

Ryo-not-rio commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jondea commented Nov 25, 2025

Uh oh!

Ryo-not-rio commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ryo-not-rio commented Nov 25, 2025 •

edited

Loading