Hugging Face 下载vLLm使用的模型

前往Hugging Face 寻找自己需要的模型

例如：
下载这个模型 https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
复制模型名称 Qwen/Qwen3-VL-4B-Instruct

下载模型

使用 huggingface-cli 命令行工具（推荐）

pip install huggingface_hub

登录 Hugging Face（如果是私有模型或需要避免速率限制）公共模型可以不用

huggingface-cli login

然后输入你的 Access Token (https://huggingface.co/settings/tokens)

下载模型：

export HF_ENDPOINT=https://hf-mirror.com
hf download Qwen/Qwen3-VL-4B-Instruct --local-dir Qwen3-VL-4B
普通版本的模型会默认占用90%的显存，可能会出现显存不够的问题，可以选择量化版本的模型进行下载

mac 安装vllm 下载模型流程

#创建虚拟环境
conda create --name vllm_dev python=3.12
#使用环境
conda activate vllm_dev
#安装huggingface_hub
pip install huggingface_hub
#下载模型
export HF_ENDPOINT=https://hf-mirror.com
hf download Qwen/Qwen3-Embedding-0.6B