---
layout: single
title:  "Private GPT"
date:   2023-09-02 08:00:00 +0800
categories: [AI 与大模型, 操作系统]
tags: [Chroma, Faiss, Gunicorn, Uvicorn, SQLite3, GPT, Shell, PrivateGPT]
---

## 模型
### [Taiyi-CLIP-Roberta-102M-Chinese 中文CLIP模型](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese)
使用电网的图片测试了一下，效果不理想。

### HuggingFace 下载
* [The pipeline API](https://huggingface.co/docs/transformers.js/pipelines)
* [using pipelines with a local model](https://stackoverflow.com/questions/68536546/using-pipelines-with-a-local-model)
* [How to download hugging face sentiment-analysis pipeline to use it offline?](https://stackoverflow.com/questions/66906652/how-to-download-hugging-face-sentiment-analysis-pipeline-to-use-it-offline)
* [How to Download Hugging Face Sentiment-Analysis Pipeline for Offline Use](https://saturncloud.io/blog/how-to-download-hugging-face-sentimentanalysis-pipeline-for-offline-use/)
* [pipeline does not load from local folder, instead, it always downloads models from the internet.](https://github.com/huggingface/transformers/issues/21613)
* [Download files from the Hub](https://huggingface.co/docs/huggingface_hub/guides/download)
* [Download models for local loading](https://discuss.huggingface.co/t/download-models-for-local-loading/1963)
* [How to download Huggingface Transformers model?](https://androidkt.com/how-to-download-huggingface-transformers-model/)
* [Download Huggingface models](https://medium.com/@irene-zhou/download-huggingface-models-b1a196f83c65)
* [huggingface transformers预训练模型如何下载至本地，并使用？](https://zhuanlan.zhihu.com/p/147144376)

### Qwen-7B
#### chat(..., stream=True)
* [Chat 函数有bug](https://github.com/QwenLM/Qwen-7B/issues/100)

* [LLM/阿里：通义千问QWen-7b与Qwen-7B-Chat](https://zhuanlan.zhihu.com/p/647873194)
* [通义千问-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary)

## 向量数据库
### [Chroma](https://www.trychroma.com/)

`Chroma` 向量数据库依赖于 `sqlite3`，而且需要 sqlite3 >= 3.35.0。

制作镜像时，使用的是 `python:3.10` 和 `python:3.10-slim`，使用 `apt install sqlite3` 能够安装的 `sqlite3` 最高版本是 3.34.1，所以会出现下面的错误。

```shell
    import chromadb
  File "/usr/local/lib/python3.10/site-packages/chromadb/__init__.py", line 69, in <module>
    raise RuntimeError(
RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0.
Please visit https://docs.trychroma.com/troubleshooting#sqlite to learn how to upgrade.
```

* [Troubleshooting](https://docs.trychroma.com/troubleshooting#sqlite)
* [LangChain Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma)
* [LangChain Chroma - load data from Vector Database](https://stackoverflow.com/questions/76232375/langchain-chroma-load-data-from-vector-database)

这里采用编译源码的方式安装 `sqlite3`。

```shell
wget https://www.sqlite.org/2023/sqlite-autoconf-3430000.tar.gz
tar -zxvf sqlite-autoconf-3430000.tar.gz
cd sqlite-autoconf-3430000
./configure --prefix=/usr/local
make install
```
```
make[1]: Entering directory '/sqlite-autoconf-3430000'
 /bin/mkdir -p '/usr/local/lib'
 /bin/bash ./libtool   --mode=install /usr/bin/install -c   libsqlite3.la '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/libsqlite3.so.0.8.6 /usr/local/lib/libsqlite3.so.0.8.6
libtool: install: (cd /usr/local/lib && { ln -s -f libsqlite3.so.0.8.6 libsqlite3.so.0 || { rm -f libsqlite3.so.0 && ln -s libsqlite3.so.0.8.6 libsqlite3.so.0; }; })
libtool: install: (cd /usr/local/lib && { ln -s -f libsqlite3.so.0.8.6 libsqlite3.so || { rm -f libsqlite3.so && ln -s libsqlite3.so.0.8.6 libsqlite3.so; }; })
libtool: install: /usr/bin/install -c .libs/libsqlite3.lai /usr/local/lib/libsqlite3.la
libtool: install: /usr/bin/install -c .libs/libsqlite3.a /usr/local/lib/libsqlite3.a
libtool: install: chmod 644 /usr/local/lib/libsqlite3.a
libtool: install: ranlib /usr/local/lib/libsqlite3.a
libtool: finish: PATH="/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sbin" ldconfig -n /usr/local/lib
----------------------------------------------------------------------
Libraries have been installed in:
   /usr/local/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
 /bin/mkdir -p '/usr/local/bin'
  /bin/bash ./libtool   --mode=install /usr/bin/install -c sqlite3 '/usr/local/bin'
libtool: install: /usr/bin/install -c sqlite3 /usr/local/bin/sqlite3
 /bin/mkdir -p '/usr/local/include'
 /usr/bin/install -c -m 644 sqlite3.h sqlite3ext.h '/usr/local/include'
 /bin/mkdir -p '/usr/local/share/man/man1'
 /usr/bin/install -c -m 644 sqlite3.1 '/usr/local/share/man/man1'
 /bin/mkdir -p '/usr/local/lib/pkgconfig'
 /usr/bin/install -c -m 644 sqlite3.pc '/usr/local/lib/pkgconfig'
make[1]: Leaving directory '/sqlite-autoconf-3430000'
```

* x86
```shell
cp /usr/local/lib/libsqlite3.so.0.8.6 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
```

* arm64
```shell
cp /usr/local/lib/libsqlite3.so.0.8.6 /usr/lib/aarch64-linux-gnu/libsqlite3.so.0
```

* [SQLite Download Page](https://www.sqlite.org/download.html)
* [How to Install SQLite3 from Source on Linux (With a Sample Database)](https://www.thegeekstuff.com/2011/07/install-sqlite3/)
* [Load embedding from disk - Langchain Chroma DB](https://community.openai.com/t/load-embedding-from-disk-langchain-chroma-db/290297)

### [Milvus](https://milvus.io)
#### 图片搜索
```shell
import os
import numpy as np
from pymilvus import FieldSchema, CollectionSchema, Collection, DataType, connections, utility

from app.models.search_image import SearchImageModel
from app.config import Config

config = Config()
model = SearchImageModel()
model.load(config)


def get_images(path):
    image_paths = []

    ext_names = ['.png', '.jpg', '.jpeg']

    for filename in os.listdir(path):
        _, ext_name = os.path.splitext(filename.lower())
        if ext_name.lower() in ext_names:
            image_paths.append(filename)

    return image_paths


image_path = 'data/images/20190128155421222575013.jpg'
image_features = model.get_image_features_with_path(image_path)
print('*'*100, image_features.shape)
images = get_images('data/images')
for image in images:
    file_path = f'data/images/{image}'
    image_features = model.get_image_features(file_path)
    collection.insert([[file_path], [image_features]])


COLLECTION_NAME = 'PrivateGPTImage'  # Collection name
connections.connect(host='localhost', port=19530)

if utility.has_collection(COLLECTION_NAME):
    utility.drop_collection(COLLECTION_NAME)

fields = [
    FieldSchema(name='path', dtype=DataType.VARCHAR, description='Image path', is_primary=True, auto_id=False, max_length=1024),
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, description='Embedding vectors', dim=512)
]
schema = CollectionSchema(fields=fields, description='Image Collection')
collection = Collection(name=COLLECTION_NAME, schema=schema)

images = get_images('data/images')
for image in images:
    file_path = f'data/images/{image}'
    image_features = model.get_image_features_with_path(file_path)
    image_features /= np.linalg.norm(image_features)
    # image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    collection.insert([[file_path], image_features.numpy()])

index_params = {
    'index_type': 'IVF_FLAT',
    'metric_type': 'L2',
    'params': {'nlist': 512}
}
collection.create_index(field_name="embedding", index_params=index_params)
collection.load()


collection = Collection(COLLECTION_NAME)
collection.load()

def search_image(text):
    # Search parameters for the index
    search_params={
        "metric_type": "L2"
    }

    data = model.get_text_features(text)
    data /= np.linalg.norm(data)
    # data /= data.norm(dim=-1, keepdim=True)
    search_param = {
        "data": data.numpy(),
        "anns_field": "embedding",
        "param": {"metric_type": "L2", "offset": 1},
        "limit": 10,
        "output_fields": ["path"],
    }
    results=collection.search(**search_param)

    ret=[]
    for hit in results[0]:
        row=[]
        row.extend([hit.id, hit.score, hit.entity.get('path')])  # Get the id, distance, and title for the results
        ret.append(row)
    return ret

results = search_image('Working at heights wearing a helmet') # 戴着安全帽高空作业
for result in results:
    print(result)

utility.drop_collection(COLLECTION_NAME)
```

#### 文本搜索
```shell
from langchain.vectorstores import Milvus
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

EMBEDDING_MODEL_NAME='BAAI/bge-base-zh'
EMBEDDING_MODEL_CACHE_DIRECTORY='models/embeddings'

from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME, 
                                   cache_folder=EMBEDDING_MODEL_CACHE_DIRECTORY)
vector_store = Milvus(embedding_function=embeddings, 
                      collection_name="PrivateGPT", 
                      connection_args={"host": 'localhost', "port": 19530},
                      drop_old = True)

loader = TextLoader('data/docs/test.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

vector_store.add_documents(texts)
docs = vector_store.similarity_search("有多少张图片")
for i, doc in enumerate(docs):
    print(f'{i} - {len(doc.page_content)}', doc.page_content[:100])
```

* [Milvus Docs](https://milvus.io/docs)
* [Question Answering Using Milvus and Hugging Face](https://milvus.io/docs/integrate_with_hugging-face.md)
* [Similarity Search with Milvus and OpenAI](https://milvus.io/docs/integrate_with_openai.md)
* [Build a Milvus Powered Text-Image Search Engine in Minutes](https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb)
* [Deep Dive into Text-Image Search Engine with Towhee](https://github.com/towhee-io/examples/blob/main/image/text_image_search/2_deep_dive_text_image_search.ipynb)
* [基于Milvus的向量搜索实践（一）](https://mp.weixin.qq.com/s?__biz=MzU3OTY2MjQ2NQ==&mid=2247485325&idx=1&sn=f62c471d2e7cf9051f17602c12a364c3)
* [基于Milvus的向量搜索实践（二）](https://mp.weixin.qq.com/s?__biz=MzU3OTY2MjQ2NQ==&mid=2247485384&idx=1&sn=fee83ac6c9d5d2b6ed76a5699f137c50)
* [基于Milvus的向量搜索实践（三）](https://mp.weixin.qq.com/s?__biz=MzU3OTY2MjQ2NQ==&mid=2247485412&idx=1&sn=9f0f790355e4867f26822d1a7e86fffa)
* [向量检索：如何取舍 Milvus 索引实现搜索优化？](https://time.geekbang.org/dailylesson/detail/100075742)
* [笔记︱几款多模态向量检索引擎：Faiss 、milvus、Proxima、vearch、Jina等](https://zhuanlan.zhihu.com/p/364923722)
* [PyMilvus](https://pymilvus.readthedocs.io/en/latest/)
* [A purposeful rendezvous with Milvus — the vector database](https://medium.com/@indirakrigan/a-purposeful-rendezvous-with-milvus-the-vector-database-2acee4da25e2)
* [Install Milvus Standalone with Docker Compose (CPU)](https://milvus.io/docs/install_standalone-docker.md)

### [Faiss](https://faiss.ai/index.html)


## 文件重复检测
### Redis

运行 Redis 服务。

```shell
docker run --name redis -it -p 6379:6379 -v $(pwd)/data/redis:/data redis redis-server --save 60 1
```

安装 `redis` Python 包。

```shell
pip install redis
```

测试。

```shell
import redis
from redis.exceptions import ConnectionError

try:
    r = redis.Redis(host='localhost', port=6379, decode_responses=True)
    r.ping()
    print('Connected!')
except ConnectionError as ex:
    print('Error:', ex)
    raise Exception

r.set('hello', 'world')
r.get('hello')
```

### 使用 Redis 保存图片的 MD5 值

```shell
import os, hashlib

dir = 'images'
for filename in os.listdir(dir):
    file_path = f'{dir}/{filename}'
    file_hash = hashlib.md5(open(file_path, 'rb').read()).hexdigest()
    if not r.get(file_hash):
        r.set(file_hash, file_path)
        print(file_hash, file_path)
```

* [Redis](https://redis.io/)
* [Redis Docker Hub](https://hub.docker.com/_/redis)
* [Finding duplicate files and removing them](https://stackoverflow.com/questions/748675/finding-duplicate-files-and-removing-them)
* [Finding Duplicate Files with Python](https://www.geeksforgeeks.org/finding-duplicate-files-with-python/)

## Gunicorn
### 服务多进程运行，出现竞争问题。

执行下面的命令出现错误信息。

```shell
gunicorn --worker-class uvicorn.workers.UvicornWorker --config app/gunicorn_conf.pyc app.main:app
```

错误信息

```shell
Traceback (most recent call last):
  File "/usr/local/bin/gunicorn", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 236, in run
    super().run()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 229, in run
    self.halt(reason=inst.reason, exit_status=inst.exit_status)
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 342, in halt
    self.stop()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 396, in stop
    time.sleep(0.1)
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
    self.reap_workers()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 530, in reap_workers
    raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
```

使用 `--preload` 参数可以解决这个问题。

```shell
gunicorn --worker-class uvicorn.workers.UvicornWorker --config app/gunicorn_conf.pyc --preload app.main:app
```

* [gunicorn 报错 Worker failed to boot. 解决办法](https://blog.csdn.net/m0_38007695/article/details/88780594)

### 进程运行数量过多，出现内存不足的错误。

```shell
MAX_WORKERS=10 gunicorn --worker-class uvicorn.workers.UvicornWorker --config app/gunicorn_conf.pyc --preload app.main:app
```

错误信息

```shell
[2023-09-02 09:14:07 +0000] [1] [ERROR] Worker (pid:212) was sent SIGKILL! Perhaps out of memory?
```

使用 `MAX_WORKERS` 参数减少进程数量。

```shell
MAX_WORKERS=3 gunicorn --worker-class uvicorn.workers.UvicornWorker --config app/gunicorn_conf.pyc --preload app.main:app
```

* [Gunicorn worker terminated with signal 9](https://stackoverflow.com/questions/67637004/gunicorn-worker-terminated-with-signal-9)
* [tiangolo/uvicorn-gunicorn-fastapi-docker](https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker)

## Python
### 文本转换为 bool 类型

```python
>>> eval('True')
True
>>> eval('False')
False
```

直接使用 `bool` 函数会出现下面的错误。

```python
>>> bool('True')
True
>>> bool('False')
True
```

* [Converting from a string to boolean in Python](https://stackoverflow.com/questions/715417/converting-from-a-string-to-boolean-in-python)

## Gradio
### Gallery
* [Cannot drag and drop image from Gallery to Image](https://github.com/gradio-app/gradio/issues/4377)
* [Specifying Gallery's height causes unexpected display of images](https://github.com/gradio-app/gradio/issues/3515)
* [Adjust width / height of image preview in the Image Component?](https://github.com/gradio-app/gradio/issues/654)

### mount_gradio_app
* [gradio/demo/custom_path/run.py](https://github.com/gradio-app/gradio/blob/main/demo/custom_path/run.py)
* [mount_gradio_app causing reload loop](https://github.com/gradio-app/gradio/issues/2427)
* [gradio HTML component with javascript code don't work](https://stackoverflow.com/questions/76071586/gradio-html-component-with-javascript-code-dont-work)

* [Build a demo with Gradio](https://huggingface.co/learn/audio-course/chapter5/demo)
* [Gradio Controlling Layout](https://www.gradio.app/guides/controlling-layout)
* [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer)
* [LoraTheExplorer/app.py](https://huggingface.co/spaces/multimodalart/LoraTheExplorer/blob/main/app.py)
* [LoraTheExplorer/custom.css](https://huggingface.co/spaces/multimodalart/LoraTheExplorer/blob/main/custom.css)
* [Gradio tutorial (Build machine learning applications)](https://www.machinelearningnuggets.com/gradio-tutorial/)
* [Gradio File](https://www.gradio.app/docs/file)

## Shell
### 查找文件软链接的绝对路径

```shell
ls -l /usr/lib/aarch64-linux-gnu/libsqlite3.so.0
```
```
lrwxrwxrwx 1 root root 19 Feb 24  2021 /usr/lib/aarch64-linux-gnu/libsqlite3.so.0 -> libsqlite3.so.0.8.6
```

```shell
readlink -f /usr/lib/aarch64-linux-gnu/libsqlite3.so.0
```
```
/usr/lib/aarch64-linux-gnu/libsqlite3.so.0.8.6
```

* [Find out symbolic link target via command line](https://serverfault.com/questions/76042/find-out-symbolic-link-target-via-command-line)

## 构建镜像
### Dockerfile
```dockerfile
FROM python:3.10-slim

ARG SQLITE3_PATH
ENV APP_HOME=/private-gpt

WORKDIR ${APP_HOME}

# 编译Sqlite3
RUN wget https://www.sqlite.org/2023/sqlite-autoconf-3430000.tar.gz \
    && tar -zxvf sqlite-autoconf-3430000.tar.gz \
    && cd sqlite-autoconf-3430000 \
    && ./configure --prefix=/usr/local \
    && make \
    && make install \
    && cd .. \
    && rm -rf sqlite-autoconf-3430000 \
    && rm -rf sqlite-autoconf-3430000.tar.gz

# 拷贝Sqlite3
COPY --from=builder /usr/local/lib/libsqlite3.so.0.8.6 ${SQLITE3_PATH}
```

### 构建多平台镜像的脚本
```shell
build_image() {
    local dockerfile=$1
    local app_name=$2
    local platforms=($3)
    local platform_sqlite3_paths=($4)

    for ((i=0; i<${#platforms[@]}; ++i))
    do
        echo "🐳 Building $app_name:${platforms[i]}, Sqlite3 Path: ${platform_sqlite3_paths[i]}"
        docker buildx build --progress=plain --platform=linux/${platforms[i]} --rm -f $dockerfile \
            --build-arg SQLITE3_PATH=${platform_sqlite3_paths[i]} \
            -t wangjunjian/$app_name:${platforms[i]} "."
        echo "💯\n"
    done

}

APP_NAME=private-gpt
PLATFORMS=(amd64 arm64)
PLATFORM_SQLITE3_PATHS=(/usr/lib/x86_64-linux-gnu/libsqlite3.so.0 /usr/lib/aarch64-linux-gnu/libsqlite3.so.0)
build_image Dockerfile $APP_NAME "${PLATFORMS[*]}" "${PLATFORM_SQLITE3_PATHS[*]}"
```

### 测试镜像
```shell
docker run --rm -it -p 8000:80 -e MAX_WORKERS=1 wangjunjian/private-gpt:arm64
```

### 上传镜像
```shell
docker push wangjunjian/private-gpt:amd64
```

### 下载镜像
```shell
docker pull wangjunjian/private-gpt:amd64
``` 

### 运行镜像
```shell
docker run -d --name private-gpt -p 8888:80 -v $(pwd)/storage:/private-gpt/storage -e MAX_WORKERS=1 wangjunjian/private-gpt:amd64
``` 

* [bash shell script two variables in for loop](https://stackoverflow.com/questions/11215088/bash-shell-script-two-variables-in-for-loop)
* [wangjunjian/ultralytics-serving](https://hub.docker.com/r/wangjunjian/ultralytics-serving/tags)


## 参考资料
* [privateGPT walkthrough: Creating your own offline GPT Q&A system](https://medium.com/@aayushmnit/privategpt-walkthrough-creating-your-own-offline-gpt-q-a-system-4bd7586cebd1)
* [OpenAI CLIP](https://github.com/openai/CLIP)
* [Hugging Face CLIP](https://huggingface.co/docs/transformers/model_doc/clip)
* [How do I persist to disk a temporary file using Python?](https://stackoverflow.com/questions/94153/how-do-i-persist-to-disk-a-temporary-file-using-python)
* [Making Neural Search Queries Accessible to Everyone with Gradio — Deploying Haystack’s Semantic Document Search with Hugging Face models in Gradio in Three Easy Steps](https://medium.com/@duerr.sebastian/making-neural-search-queries-accessible-to-everyone-with-gradio-haystack-726e77aca047)
* [Towhee](https://github.com/towhee-io/towhee)
* [Visualize nearest neighbor search on reverse image search](https://codelabs.towhee.io/visualize-nearest-neighbor-search-on-reverse-image-search/index)
* [Fine-Grained Image Similarity Detection Using Facebook AI Similarity Search(FAISS)](https://medium.com/swlh/fine-grained-image-similarity-detection-using-facebook-ai-similarity-search-faiss-b357da4f1644)
* [Building an image search engine with Python and Faiss](https://thetisdev.hashnode.dev/building-an-image-search-engine-with-python-and-faiss)
* [Fast and Simple Image Search with Foundation Models](https://www.ivanzhou.me/blog/2023/3/19/fast-and-simple-image-search-with-foundation-models)
* [250+ Free Machine Learning Datasets for Instant Download](https://datasets.activeloop.ai/docs/ml/datasets/)
* [Image search with 🤗 datasets](https://huggingface.co/blog/image-search-datasets)
* [FAISS (Facebook AI Similarity Search)](https://www.activeloop.ai/resources/glossary/faiss-facebook-ai-similarity-search/)
* [Deep Lake Docs](https://docs.activeloop.ai/)
* [Weaviate](https://weaviate.io/)
* [Weaviate GitHub](https://github.com/weaviate/weaviate)
* [Milvus makes it easy to add similarity search to your applications](https://milvus.io/milvus-demos/)
* [Milvus](https://github.com/milvus-io/milvus)
* [8 Best Vector Databases to Unleash the True Potential of AI](https://geekflare.com/best-vector-databases/)
* [12 Vector Databases For 2023: A Review](https://lakefs.io/blog/12-vector-databases-2023/)
* [HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html)
* [How to Use FAISS to Build Your First Similarity Search](https://medium.com/loopio-tech/how-to-use-faiss-to-build-your-first-similarity-search-bf0f708aa772)
* [LangChain Vector stores](https://python.langchain.com/docs/integrations/vectorstores/)
* [Introduction to Facebook AI Similarity Search (Faiss)](https://www.pinecone.io/learn/series/faiss/faiss-tutorial/)
* [Faiss: A library for efficient similarity search](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/)
* [DocArray](https://github.com/docarray/docarray)
* [Welcome to DocArray!](https://docarray.jina.ai/index.html)
* [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
* [Private GPT](https://github.com/imartinez/privateGPT/blob/main/ingest.py)
* [基于localGPT 和 streamlit 打造个人知识库问答机器人](https://zhuanlan.zhihu.com/p/649697654)
* [gpt4-pdf-chatbot-langchain](https://github.com/mayooear/gpt4-pdf-chatbot-langchain)
* [Knowledge QA LLM](https://github.com/RapidAI/Knowledge-QA-LLM)
* [Knowledge-QA-LLM: 基于本地知识库+LLM的开源问答系统](https://zhuanlan.zhihu.com/p/646768641)
* [闻达：一个大规模语言模型调用平台](https://github.com/wenda-LLM/wenda)
* [Building a FastAPI App with the Gradio Python Client](https://www.gradio.app/guides/fastapi-app-with-the-gradio-client)
* [How to use your own data with Dolly](https://huggingface.co/databricks/dolly-v2-12b/discussions/88)
* [Using Langchain, Chroma, and GPT for document-based retrieval-augmented generation](https://developer.dataiku.com/latest/tutorials/machine-learning/genai/nlp/gpt-lc-chroma-rag/index.html)
* [face_recognition](https://face-recognition.readthedocs.io/en/latest/face_recognition.html)
* [Chat completions API](https://platform.openai.com/docs/guides/gpt/chat-completions-api)
* [ImageSearcher/image_searcher/embedders/face_embedder.py](https://github.com/ManuelFay/ImageSearcher/blob/master/image_searcher/embedders/face_embedder.py)
* [GPT best practices](https://platform.openai.com/docs/guides/gpt-best-practices)
* [Jina](https://jina.ai/)
* [PromptPerfect 专业一流的提示词工程开发工具](https://promptperfect.jina.ai/)
* [Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service](https://aws.amazon.com/cn/blogs/machine-learning/implement-unified-text-and-image-search-with-a-clip-model-using-amazon-sagemaker-and-amazon-opensearch-service/)
* [中文CLIP模型开源](https://zhuanlan.zhihu.com/p/546245070)
* [LangChain Tutorial in Python - Crash Course](https://www.python-engineer.com/posts/langchain-crash-course/)
* [Qwen-7B ReAct Prompting 示例](https://github.com/QwenLM/Qwen-7B/blob/main/examples/react_prompt.md)
* [LangChain - 打造自己的GPT（五）拥有本地高效、安全的Sentence Embeddings For Chinese & English](https://zhuanlan.zhihu.com/p/622017658)
* [想自己利用OpenAI做一个文档问答的话](https://zhuanlan.zhihu.com/p/614334596)
* [LangChain及LangFlow使用指南](https://www.eula.club/blogs/LangChain%E5%8F%8ALangFlow%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97.html)
* [Query Your Own Documents with LlamaIndex and LangChain](https://alphasec.io/query-your-own-documents-with-llamaindex-and-langchain/)
* [分词 -- 从源码解读LangChain-ChatGLM(二)](https://zhuanlan.zhihu.com/p/638929185)
* [LlamaIndex Node Parser](https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/node_parsers/root.html)
