4 篇文章带有标签 “metal”

使用 llama.cpp 构建本地聊天服务

llama.cpp

  • 纯 C/C++ 实现
  • Apple 芯片 ARM NEON, Accelerate, Metal
  • x86 架构 AVX, AVX2, AVX512
  • 混合F16/F32精度
  • 整数量化 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, 8-bit
  • 后端支持 CUDA, Metal, OpenCL GPU

构建

❶ 克隆 [llama.cpp][llama.cpp] 仓库

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

❷ make

make -j

❸ 安装依赖

pip install -r requirements.txt

获得 Facebook LLaMA2 模型

可以从 TheBloke 下载已转换和量化的模型。

下载 GGUF 模型

huggingface-cli pip install huggingface_hub REPO_ID=TheBloke/Llama-2-7B-chat-GGUF FILENAME=llama-2-7b-chat.Q4_K_M.

在 MacBook Pro M2 Max 上安装 TensorFlow

安装 TensorFlow

sudo conda create --name tensorflow python
conda activate tensorflow

# 不指定环境(-n),默认安装到base环境
sudo conda install -c apple -n tensorflow tensorflow-deps
pip install tensorflow-macos
pip install tensorflow-metal
sudo conda install notebook -y

pip install numpy  --upgrade
pip install pandas  --upgrade
pip install matplotlib  --upgrade
pip install scikit-learn  --upgrade
pip install scipy  --upgrade
pip install plotly  --upgrade

验证 import sys import tensorflow.keras import tensorflow as tf import platform print(f"Python Platform: {platform.