Commit 50bb656
authored
[Pipelines] infer model device with optional override (#1572)
## Purpose ##
* Fix support for deepseekv2.5
* Add more robustness inference for model devices when calibrating
## Prerequisites ##
* vllm-project/compressed-tensors#363
## Background ##
Normally, starting model inputs on the cpu is not an issue for the
sequential pipeline, since the sequential pipeline offloads models and
offloaded models automatically place inputs on the proper devices.
However, the deepseekv2.5 model is an exception, as this model [performs
an add
operation](https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/main/modeling_deepseek.py#L886)
between a module output (`attn_weights` and a model input
`attention_mask`) before the model input has a chance to be placed on
the proper device.
## Changes ##
* Use `model_device` when deciding the onload device for model inputs
## Testing ##
* Ran deepseekv2.5 example to completion
* TODO: run nightly to confirm other models work with new input device
placement
---------
Signed-off-by: Kyle Sayers <[email protected]>1 parent 6800f81 commit 50bb656
File tree
4 files changed
+9
-5
lines changed- src/llmcompressor
- args
- pipelines
- layer_sequential
- sequential
4 files changed
+9
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
71 | | - | |
| 72 | + | |
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
74 | | - | |
| 75 | + | |
75 | 76 | | |
76 | 77 | | |
77 | 78 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| |||
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
72 | | - | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
0 commit comments