https://github.com/vllm-project/vllm
https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html
https://docs.astral.sh/uv/#learn-more
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv myenv
--python 3.12
--seed
--link-mode=copy
source myenv/bin/activate
set global.index-url https://mirrors.aliyun.com/pypi/simple pip config
set global.trusted-host mirrors.aliyun.com pip config
pip install vllm
# du -sh myenv*
6.5M myenv (不装任何包的大小)
7.8G myenv1 (装完 vllm 的大小)
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
safetensors 0.5.3
torch 2.6.0
transformers 4.51.3
vllm 0.8.5.post1
(myenv1) # vllm --help
INFO 05-14 14:55:02 [__init__.py:239] Automatically detected platform cuda.
usage: vllm [-h] [-v] {chat,complete,serve,bench,collect-env} ...
vLLM CLI
positional arguments:
{chat,complete,serve,bench,collect-env}
chat Generate chat completions via the running API server.
complete Generate text completions based on the given prompt via the running API server.
serve Start the vLLM OpenAI Compatible API server.
bench vLLM bench subcommand.
collect-env Start collecting environment information.
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
ln -s /data/modelscope ~/.cache/modelscope
ln -s /data/huggingface ~/.cache/huggingface
pip install modelscope
export VLLM_USE_MODELSCOPE=True
vllm serve Qwen/Qwen3-0.6B
--enable-reasoning
--reasoning-parser deepseek_r1
--port 12345
git clone https://aifasthub.com/Qwen/Qwen3-0.6B
-
uv 工具、pip 国内源 -
vllm 与 cuda 版本 -
modelscop 与 huggingface
原文始发于微信公众号(生有可恋):vLLM 环境安装
免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉。
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论