[Bug]: Multinode TP Errors when TP > number GPUS on 1 node, but <= world size

### Your current environment

[vllm_out.log](https://github.com/user-attachments/files/23757025/vllm_out.log)

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>

2x GB200 nodes, vllm 0.11.2

### 🐛 Describe the bug

On head node:
`ray start --head --num-gpus 4`
Copy output command for worker node
`ray start --address='<IP>:<PORT>' --num-gpus 4`
Ensure 8 gpus are visible on `ray status`.
On head node: 
`vllm serve meta-llama/Llama-3.3-70B-Instruct --gpu-memory-utilization 0.9 --served-model-name llama3.3-70b --tensor-parallel-size 8 --pipeline-parallel-size 1 --data-parallel-size 1 --max-model-len 2048 --port $port`

[Output log (portions redacted)](https://github.com/user-attachments/files/23757034/vllm_out.log)

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Multinode TP Errors when TP > number GPUS on 1 node, but <= world size #29447

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Multinode TP Errors when TP > number GPUS on 1 node, but <= world size #29447

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions