Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions Qwen/AMD/Qwen3-Next-AMD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#### Step by Step Guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better document structure and consistency with other guides in the repository, it's recommended to use a level 1 heading for the main title of the document.

Suggested change
#### Step by Step Guide
# Qwen3-Next on AMD GPU

Please follow the steps here to install and run Qwen3-Next-80B-A3B-Instruct models on AMD MI300X GPU.
#### Step 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better structure, steps should be level 2 headings. A descriptive title also helps clarity.

Suggested change
#### Step 1
## Step 1: Prepare Docker Environment

Pull the latest vllm docker:
```shell
docker pull rocm/vllm-dev:nightly
```
Launch the Rocm-vllm docker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in "Rocm-vllm". It should be "ROCm vLLM" for correctness and clarity.

Suggested change
Launch the Rocm-vllm docker:
Launch the ROCm vLLM docker:

```shell
docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Mounting the host's root directory (/) into the container with -v /:/work is a significant security risk. It grants the container unrestricted access to the entire host filesystem. It is strongly recommended to mount a specific working directory instead, for example, the current directory.

Suggested change
docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly
docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly

```
#### Step 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistent heading structure, this should be a level 2 heading with a descriptive title.

Suggested change
#### Step 2
## Step 2: Log in to Hugging Face

Huggingface login
```shell
huggingface-cli login
```
Comment on lines +13 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is unnecessary leading and trailing whitespace in this section which should be removed for cleaner formatting.

Suggested change
Huggingface login
```shell
huggingface-cli login
```
Huggingface login
```shell
huggingface-cli login

#### Step 3
##### FP8
Comment on lines +17 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better structure and clarity, this step's heading should be a level 2 heading, and the "FP8" detail can be incorporated into it. This avoids a deep and less readable heading level.

Suggested change
#### Step 3
##### FP8
## Step 3: Start the vLLM server (FP8)


Run the vllm online serving
Sample Command
```shell
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an extra space in the command that should be removed for correctness.

Suggested change
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching

```
#### Step 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistent heading structure, this should be a level 2 heading. The trailing space should also be removed.

Suggested change
#### Step 4
## Step 4: Run Benchmark

Open a new terminal, enter into the running docker and run the following benchmark script.
```shell
docker exec -it Qwen3-next /bin/bash
python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el
```
Comment on lines +26 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instructions for running the benchmark can be simplified. Instead of opening an interactive shell and then running the script, you can execute the script directly with docker exec. This makes it a single, non-interactive command and is less prone to user error. This also fixes extra whitespace in the command.

Suggested change
Open a new terminal, enter into the running docker and run the following benchmark script.
```shell
docker exec -it Qwen3-next /bin/bash
python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el
```
Open a new terminal and run the following command to execute the benchmark script inside the container.
```shell
docker exec -it Qwen3-next python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el