vllm-project · haic0 · Oct 23, 2025 · gemini-code-assist · Oct 23, 2025 · gemini-code-assist
diff --git a/Qwen/AMD/Qwen3-Next-AMD.md b/Qwen/AMD/Qwen3-Next-AMD.md
@@ -0,0 +1,30 @@
+#### Step by Step Guide
-#### Step by Step Guide
+# Qwen3-Next on AMD GPU
-#### Step by Step Guide
+# Qwen3-Next on AMD GPU
+Please follow the steps here to install and run Qwen3-Next-80B-A3B-Instruct models on AMD MI300X GPU.
+#### Step 1
-#### Step 1
+## Step 1: Prepare Docker Environment
-#### Step 1
+## Step 1: Prepare Docker Environment
+Pull the latest vllm docker:
+```shell
+docker pull rocm/vllm-dev:nightly
+```
+Launch the Rocm-vllm docker: 
-Launch the Rocm-vllm docker: 
+Launch the ROCm vLLM docker: 
-Launch the Rocm-vllm docker: 
+Launch the ROCm vLLM docker: 
+```shell
+docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash  --name Qwen3-next rocm/vllm-dev:nightly
-docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash  --name Qwen3-next rocm/vllm-dev:nightly
+docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash  --name Qwen3-next rocm/vllm-dev:nightly
-docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash  --name Qwen3-next rocm/vllm-dev:nightly
+docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash  --name Qwen3-next rocm/vllm-dev:nightly
+```
+#### Step 2
-#### Step 2
+## Step 2: Log in to Hugging Face
-#### Step 2
+## Step 2: Log in to Hugging Face
+  Huggingface login
+```shell
+   huggingface-cli login 
+```   
-  Huggingface login
-```shell
-   huggingface-cli login 
-```   
+Huggingface login
+```shell
+huggingface-cli login
-  Huggingface login
-```shell
-   huggingface-cli login 
-```   
+Huggingface login
+```shell
+huggingface-cli login
+#### Step 3
+##### FP8
-#### Step 3
-##### FP8
+## Step 3: Start the vLLM server (FP8)
-#### Step 3
-##### FP8
+## Step 3: Start the vLLM server (FP8)
+
+Run the vllm online serving
+Sample Command
+```shell
+VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768  --no-enable-prefix-caching 
-VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768  --no-enable-prefix-caching 
+VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching 
-VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768  --no-enable-prefix-caching 
+VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching 
+```
+#### Step 4 
-#### Step 4 
+## Step 4: Run Benchmark
-#### Step 4 
+## Step 4: Run Benchmark
+Open a new terminal, enter into the running docker and run the following benchmark script.
+```shell
+docker exec -it Qwen3-next /bin/bash 
+python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500  --max-concurrency 128 --random-input-len 3200 --random-output-len 800  --percentile-metrics ttft,tpot,itl,e2el
+```
-Open a new terminal, enter into the running docker and run the following benchmark script.
-```shell
-docker exec -it Qwen3-next /bin/bash 
-python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500  --max-concurrency 128 --random-input-len 3200 --random-output-len 800  --percentile-metrics ttft,tpot,itl,e2el
-```
+Open a new terminal and run the following command to execute the benchmark script inside the container.
+```shell
+docker exec -it Qwen3-next python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el
-Open a new terminal, enter into the running docker and run the following benchmark script.
-```shell
-docker exec -it Qwen3-next /bin/bash 
-python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500  --max-concurrency 128 --random-input-len 3200 --random-output-len 800  --percentile-metrics ttft,tpot,itl,e2el
-```
+Open a new terminal and run the following command to execute the benchmark script inside the container.
+```shell
+docker exec -it Qwen3-next python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el