-
Notifications
You must be signed in to change notification settings - Fork 98
Create Qwen3-Next-AMD.md #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,30 @@ | ||||||||||||||||||
| #### Step by Step Guide | ||||||||||||||||||
| Please follow the steps here to install and run Qwen3-Next-80B-A3B-Instruct models on AMD MI300X GPU. | ||||||||||||||||||
| #### Step 1 | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| Pull the latest vllm docker: | ||||||||||||||||||
| ```shell | ||||||||||||||||||
| docker pull rocm/vllm-dev:nightly | ||||||||||||||||||
| ``` | ||||||||||||||||||
| Launch the Rocm-vllm docker: | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| ```shell | ||||||||||||||||||
| docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mounting the host's root directory (
Suggested change
|
||||||||||||||||||
| ``` | ||||||||||||||||||
| #### Step 2 | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| Huggingface login | ||||||||||||||||||
| ```shell | ||||||||||||||||||
| huggingface-cli login | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
Comment on lines
+13
to
+16
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| #### Step 3 | ||||||||||||||||||
| ##### FP8 | ||||||||||||||||||
|
Comment on lines
+17
to
+18
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
|
|
||||||||||||||||||
| Run the vllm online serving | ||||||||||||||||||
| Sample Command | ||||||||||||||||||
| ```shell | ||||||||||||||||||
| VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is an extra space in the command that should be removed for correctness.
Suggested change
|
||||||||||||||||||
| ``` | ||||||||||||||||||
| #### Step 4 | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| Open a new terminal, enter into the running docker and run the following benchmark script. | ||||||||||||||||||
| ```shell | ||||||||||||||||||
| docker exec -it Qwen3-next /bin/bash | ||||||||||||||||||
| python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
Comment on lines
+26
to
+30
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The instructions for running the benchmark can be simplified. Instead of opening an interactive shell and then running the script, you can execute the script directly with
Suggested change
|
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better document structure and consistency with other guides in the repository, it's recommended to use a level 1 heading for the main title of the document.