Content-Length: 726055 | pFad | http://github.com/vllm-project/vllm-spyre/commit/4bf2c8c43406073ec266697b5da3174081fcb095

CB 📝 docs for local development on CPU (#161) · vllm-project/vllm-spyre@4bf2c8c · GitHub
Skip to content

Commit 4bf2c8c

Browse files
📝 docs for local development on CPU (#161)
Add documentation on how to run the `eager` (CPU) examples + tests on `arm64` and `x86_64`. See it live - https://vllm-spyre--161.org.readthedocs.build/en/161/getting_started/local_development.html ~Will wait for #159 to merge first~ --------- Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
1 parent 7b260d0 commit 4bf2c8c

File tree

6 files changed

+200
-9
lines changed

6 files changed

+200
-9
lines changed

_local_envs_for_test.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
3+
# Env vars to set for CPU-only testing
4+
5+
# Need to be set for tests to run
6+
export MASTER_ADDR=localhost
7+
export MASTER_PORT=29500
8+
9+
# Run on CPU
10+
export VLLM_SPYRE_DYNAMO_BACKEND=eager
11+
12+
# TODO: Tests don't work on CPU with MP enabled?
13+
export VLLM_ENABLE_V1_MULTIPROCESSING=0
14+
15+
# Test related
16+
export VLLM_SPYRE_TEST_BACKEND_LIST=eager
17+
# Note: Make sure model name aligns with the model that you downloaded
18+
export VLLM_SPYRE_TEST_MODEL_LIST="JackFram/llama-160m"
19+
export VLLM_SPYRE_TEST_MODEL_DIR=""
20+
# We have to use `HF_HUB_OFFLINE=1` otherwise vllm tries to download a
21+
# different version of the model using HF API which does not work locally
22+
export HF_HUB_OFFLINE=1

docs/source/contributing/overview.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,108 @@ Commits must include a `Signed-off-by:` header which certifies agreement with
4343
the terms of the DCO.
4444

4545
Using `-s` with `git commit` will automatically add this header.
46+
47+
## Testing
48+
49+
### Running tests locally on CPU (No Spyre card)
50+
51+
1. (arm64 only) Install xgrammar
52+
53+
:::{tip}
54+
It's installed for x86_64 automatically.
55+
:::
56+
57+
```sh
58+
uv pip install xgrammar==0.1.19
59+
```
60+
61+
1. (optional) Download `JackFram/llama-160m` model for tests
62+
63+
```sh
64+
python -c "from transformers import pipeline; pipeline('text-generation', model='JackFram/llama-160m')"
65+
```
66+
67+
:::{caution}
68+
Downloading the same model using HF API does not work locally on `arm64`.
69+
:::
70+
71+
:::{tip}
72+
:class: dropdown
73+
We assume the model lands here:
74+
75+
```sh
76+
.cache/huggingface/hub/models--JackFram--llama-160m
77+
```
78+
79+
:::
80+
81+
1. Source env variables needed for tests
82+
83+
```sh
84+
source _local_envs_for_test.sh
85+
```
86+
87+
1. (optional) Install dev dependencies (if vllm-spyre was installed without them)
88+
89+
```sh
90+
uv pip install --group dev
91+
```
92+
93+
1. Run the tests:
94+
95+
```sh
96+
python -m pytest -v -x tests -m "v1 and cpu and e2e"
97+
```
98+
99+
### Continuous Batching Tests (CB)
100+
101+
:::{attention}
102+
Temporary section until FMS custom branch is merged to main
103+
:::
104+
105+
Continuous batching requires a custom installation at the moment until the FMS custom branch is merged to main.
106+
107+
To try it out, after following all steps for setting up testing as mentioned above,
108+
109+
1. Install custom FMS branch for CB:
110+
111+
```sh
112+
uv pip install git+https://github.com/foundation-model-stack/foundation-model-stack.git@paged_attn_mock --force-reinstall
113+
```
114+
115+
#### Run only CB tests
116+
117+
```sh
118+
python -m pytest -v -x tests/e2e -m cb
119+
```
120+
121+
## Debugging
122+
123+
We can debug using `debugpy` in VS code.
124+
125+
This is the content of the `launch.json` file which we need for debugging in VS code:
126+
127+
```sh
128+
{
129+
"version": "0.2.0",
130+
"configurations": [
131+
{
132+
"name": "Python Debugger: local",
133+
"type": "debugpy",
134+
"request": "attach",
135+
"connect": {
136+
"host": "localhost",
137+
"port": 5678
138+
},
139+
"justMyCode": false
140+
}
141+
]
142+
}
143+
144+
```
145+
146+
Run using
147+
148+
```sh
149+
python -m debugpy --listen 5678 -m ...
150+
```

docs/source/getting_started/installation.md

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,52 @@ installation of the plugin and its dependencies. `uv` provides advanced
55
dependency resolution which is required to properly install dependencies like
66
`vllm` without overwriting critical dependencies like `torch`.
77

8-
```bash
9-
# Install uv
10-
pip install uv
11-
12-
# Install vllm-spyre
13-
git clone https://github.com/vllm-project/vllm-spyre.git
14-
cd vllm-spyre
15-
VLLM_TARGET_DEVICE=empty uv pip install -e .
16-
```
8+
1. Clone vllm-spyre
9+
10+
```sh
11+
git clone https://github.com/vllm-project/vllm-spyre.git
12+
cd vllm-spyre
13+
```
14+
15+
1. Install uv
16+
17+
```sh
18+
pip install uv
19+
```
20+
21+
1. Create a new env
22+
23+
```sh
24+
uv venv --python 3.12 --seed .venv
25+
```
26+
27+
1. Activate it
28+
29+
```sh
30+
source .venv/bin/activate
31+
```
32+
33+
1. Install `vllm-spyre` locally with dev (and optionally lint) dependencies
34+
35+
```sh
36+
uv sync --frozen --active --inexact
37+
```
38+
39+
or also with lint:
40+
41+
```sh
42+
uv sync --frozen --active --inexact --group lint
43+
```
44+
45+
:::{tip}
46+
`--group dev` is enabled by default
47+
:::
48+
49+
1. (Optional) Install torch through pip
50+
51+
If you don't have it installed already. Will be needed
52+
for running examples or tests.
53+
54+
```sh
55+
pip install torch==2.7.0
56+
```

examples/offline_inference/offline_inference_multi_spyre.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,19 @@
11
import gc
22
import os
3+
import platform
34
import time
45

56
from vllm import LLM, SamplingParams
67

78
max_tokens = 3
89

10+
if platform.machine() == "arm64":
11+
print("Detected arm64 running environment. "
12+
"Setting HF_HUB_OFFLINE=1 otherwise vllm tries to download a "
13+
"different version of the model using HF API which might not work "
14+
"locally on arm64.")
15+
os.environ["HF_HUB_OFFLINE"] = "1"
16+
917
os.environ["VLLM_SPYRE_WARMUP_PROMPT_LENS"] = '64'
1018
os.environ["VLLM_SPYRE_WARMUP_NEW_TOKENS"] = str(max_tokens)
1119
os.environ['VLLM_SPYRE_WARMUP_BATCH_SIZES'] = '1'

examples/offline_inference/offline_inference_spyre.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
11
import os
2+
import platform
23
import time
34

45
from vllm import LLM, SamplingParams
56

67
max_tokens = 3
78

9+
if platform.machine() == "arm64":
10+
print("Detected arm64 running environment. "
11+
"Setting HF_HUB_OFFLINE=1 otherwise vllm tries to download a "
12+
"different version of the model using HF API which might not work "
13+
"locally on arm64.")
14+
os.environ["HF_HUB_OFFLINE"] = "1"
15+
816
os.environ["VLLM_SPYRE_WARMUP_PROMPT_LENS"] = '64'
917
os.environ["VLLM_SPYRE_WARMUP_NEW_TOKENS"] = str(max_tokens)
1018
os.environ['VLLM_SPYRE_WARMUP_BATCH_SIZES'] = '1'

examples/offline_inference/offline_inference_spyre_cb_test.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import os
2+
import platform
23
import time
34

45
from vllm import LLM, SamplingParams
@@ -11,6 +12,13 @@
1112
max_tokens3 = 7
1213
max_num_seqs = 2 # defines max batch size
1314

15+
if platform.machine() == "arm64":
16+
print("Detected arm64 running environment. "
17+
"Setting HF_HUB_OFFLINE=1 otherwise vllm tries to download a "
18+
"different version of the model using HF API which might not work "
19+
"locally on arm64.")
20+
os.environ["HF_HUB_OFFLINE"] = "1"
21+
1422
# defining here to be able to run/debug directly from VSC (not via terminal)
1523
os.environ['VLLM_SPYRE_DYNAMO_BACKEND'] = 'eager'
1624
os.environ['VLLM_SPYRE_USE_CB'] = '1'

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/vllm-project/vllm-spyre/commit/4bf2c8c43406073ec266697b5da3174081fcb095

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy