2 篇文章带有标签 “llama2”

使用 Ollama 构建本地聊天服务

Ollama

部署

ollama run llama2

通过 API 访问

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

ollama 帮助 ollama --help Large language model runner Usage: ollama [flags] ollama [command] Available Commands: serve Start ollama create Create a model from a Modelfile show Show information for a model run Run a model pull Pull a model from a registry push Push a model to a registry list List models cp Copy a model rm Remove a model help Help about any comman

使用 llama.cpp 构建本地聊天服务

llama.cpp

  • 纯 C/C++ 实现
  • Apple 芯片 ARM NEON, Accelerate, Metal
  • x86 架构 AVX, AVX2, AVX512
  • 混合F16/F32精度
  • 整数量化 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, 8-bit
  • 后端支持 CUDA, Metal, OpenCL GPU

构建

❶ 克隆 [llama.cpp][llama.cpp] 仓库

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

❷ make

make -j

❸ 安装依赖

pip install -r requirements.txt

获得 Facebook LLaMA2 模型

可以从 TheBloke 下载已转换和量化的模型。

下载 GGUF 模型

huggingface-cli pip install huggingface_hub REPO_ID=TheBloke/Llama-2-7B-chat-GGUF FILENAME=llama-2-7b-chat.Q4_K_M.