Skip to content

Conversation

@erman-gurses
Copy link
Contributor

@erman-gurses erman-gurses commented Nov 24, 2025

Motivation

Progress on: #2209

Test Plan

Test on local and CI

Test Result

Local test result:

=== Sanity check: driver / GPU info ===
AMD-SMI version      : 26.1.0+5df6c765
ROCm version         : 7.1.0
amdgpu driver        : 6.16.6
GPU name             : AMD Instinct MI300X
GPU target           : gfx942
=== End of sanity check ===

CI testing in progress...

Submission Checklist

@jayhawk-commits
Copy link
Contributor

This will address some of the gaps identified in #2107

@erman-gurses
Copy link
Contributor Author

erman-gurses commented Nov 24, 2025

This will address some of the gaps identified in #2107

Let me know please if you have any input for improvement - this is the initial version of it.

@jayhawk-commits
Copy link
Contributor

This will address some of the gaps identified in #2107

Let me know please if you have any input for improvement - this is the initial version of it.

Are you able to print out the max VRAM on the GPU device?
What about support for systems that have multiple GPUs?
Please also see the behaviour on APU systems (e.g., Strix Halo).

@erman-gurses erman-gurses force-pushed the users/erman-gurses/print-driver-gpu-info branch from b7e0961 to ce89a7c Compare December 4, 2025 03:29
@erman-gurses
Copy link
Contributor Author

erman-gurses commented Dec 4, 2025

@jayhawk-commits, please see my answers:

Are you able to print out the max VRAM on the GPU device?

I can do that as a follow up. I know we have that info from rocminfo.

What about support for systems that have multiple GPUs?

Do you mean the identical GPUs or different types of GPUs? For different types (if the case exists), I can do it in the follow up. For the identical ones, it is already supported.

Please also see the behavior on APU systems (e.g., Strix Halo).

The tests are not running yet for gfx1151 on CI. It is ongoing currently link for it. Expecting the same the behavior as below .

Example output from CI:
https://github.com/ROCm/TheRock/actions/runs/19916728165/job/57104976259?pr=2272#step:9:1

=== Sanity check: driver / GPU info ===
AMD-SMI version      : 26.1.0+31f7bb4c
ROCm version         : 7.11.0
amdgpu driver        : 6.12.12
GPU name             : AMD Instinct MI325X
GPU target           : gfx942
amd-smi path         : ./build/bin/amd-smi
rocminfo path        : ./build/bin/rocminfo
=== End of sanity check ===

@erman-gurses erman-gurses marked this pull request as ready for review December 4, 2025 06:25
@jayhawk-commits
Copy link
Contributor

Do you mean the identical GPUs or different types of GPUs? For different types (if the case exists), I can do it in the follow up. For the identical ones, it is already supported.

I think we need to be able to support both cases. If there are multiple identical GPUs in the system, we need to report how many.

@jayhawk-commits
Copy link
Contributor

Is there a label or change we can add to this PR @geomin12 to make tests run on Linux gfx1151? I want to see the APU behaviour.

@geomin12
Copy link
Contributor

geomin12 commented Dec 4, 2025

Is there a label or change we can add to this PR @geomin12 to make tests run on Linux gfx1151? I want to see the APU behaviour.

ran here :)
https://github.com/ROCm/TheRock/actions/runs/19916728165/job/57115898586?pr=2272

@jayhawk-commits
Copy link
Contributor

Is there a label or change we can add to this PR @geomin12 to make tests run on Linux gfx1151? I want to see the APU behaviour.

ran here :) https://github.com/ROCm/TheRock/actions/runs/19916728165/job/57115898586?pr=2272

This script is not run during the sanity test step from what I see. Only on the test project steps.

@geomin12
Copy link
Contributor

geomin12 commented Dec 4, 2025

Is there a label or change we can add to this PR @geomin12 to make tests run on Linux gfx1151? I want to see the APU behaviour.

ran here :) https://github.com/ROCm/TheRock/actions/runs/19916728165/job/57115898586?pr=2272

This script is not run during the sanity test step from what I see. Only on the test project steps.

ah shoot i didn't even read the code and answered! my bad, yeah unfortunately the 1151 linux is only running sanity checks (due to only 4 machines available)

you could add this check to test_sanity_check as well? this way, this will provide info for machines that only run sanity checks. thoughts @erman-gurses ?

@erman-gurses
Copy link
Contributor Author

Is there a label or change we can add to this PR @geomin12 to make tests run on Linux gfx1151? I want to see the APU behaviour.

ran here :) https://github.com/ROCm/TheRock/actions/runs/19916728165/job/57115898586?pr=2272

This script is not run during the sanity test step from what I see. Only on the test project steps.

ah shoot i didn't even read the code and answered! my bad, yeah unfortunately the 1151 linux is only running sanity checks (due to only 4 machines available)

you could add this check to test_sanity_check as well? this way, this will provide info for machines that only run sanity checks. thoughts @erman-gurses ?

I think calling it from test_sanity_check would be a good idea. Will check it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

4 participants