-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
2x GB200 nodes, vllm 0.11.2
π Describe the bug
On head node:
ray start --head --num-gpus 4
Copy output command for worker node
ray start --address='<IP>:<PORT>' --num-gpus 4
Ensure 8 gpus are visible on ray status.
On head node:
vllm serve meta-llama/Llama-3.3-70B-Instruct --gpu-memory-utilization 0.9 --served-model-name llama3.3-70b --tensor-parallel-size 8 --pipeline-parallel-size 1 --data-parallel-size 1 --max-model-len 2048 --port $port
Output log (portions redacted)
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
curious-broccoli
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working