文档 - 第 26 页 - 军舰的日志

2023年8月8日星期二

MinIO for Kubernetes

DirectPV

安装 DirectPV plugin

Krew

kubectl krew update
kubectl krew install directpv

Release 二进制

release=$(curl -sfL "https://api.github.com/repos/minio/directpv/releases/latest" | awk '/tag_name/ { print substr($2, 3, length($2)-4) }')
curl -fLo kubectl-directpv https://github.com/minio/directpv/releases/download/v${release}/kubectl-directpv_${release}_linux_amd64
sudo chmod a+x kubectl-directpv
sudo mv kubectl-directpv /usr/local/bin/

安装 DirectPV CSI driver kubectl directpv install ███████████████████████████████████████████████████████████████████████████ 100% ┌───────────────

2023-08-08 08:00

minio kubernetes object-storage s3 directpv storage minio-operator

2023年7月24日星期一

AI 大模型

🔥 大模型

🔥 Andrej Karpathy

🔥 李沐论文精读如何读论文 AlexNet ResNet 零基础多图详解图神经网络（GNN/GCN） GAN Transformer BERT Pre-training ViT 卷积神经网络的两个归纳偏置：1、locality（相同区域有相同的特征）；2、translation equivariance（平移等变性） local neighborhoods MAE Autoencoder 对比学习论文综述数据增强：Crop 和 Color 的组合最有效 MoCo CLIP How to Train Really Large Models on Many GPUs?

2023-07-24 08:00

llm gpt chatgpt openai ai generative-ai machine-learning deep-learning nlp computer-vision

2023年7月21日星期五

Velero: 备份和迁移 Kubernetes 资源和持久卷

介绍

Velero 是一款开源工具，用于安全备份和恢复、执行灾难恢复以及迁移 Kubernetes 集群资源和持久卷。

功能

灾难恢复减少基础设施丢失、数据损坏和/或服务中断时的恢复时间。

数据迁移通过轻松地将 Kubernetes 资源从一个集群迁移到另一个集群，实现集群可移植性。

数据保护提供关键数据保护功能，例如计划备份、保留计划以及用于自定义操作的备份前或备份后挂钩。

是什么让 Velero 脱颖而出？

与其他直接访问 Kubernetes etcd 数据库来执行备份和恢复的工具不同，Velero 使用 Kubernetes API 来捕获集群资源的状态并在必要时恢复它们。这种 API 驱动的方法具有许多关键优势：

备份可以捕获集群资源的子集，按命名空间、资源类型和/或标签选择器进行过滤，从而为备份和恢复的内容提供高度的灵活性。
托管 Kubernetes 产品的用户通常无权访问底层 etcd 数据库，因此无法直接备份/恢复它。
通过聚合 API 服务器公开的资源可以轻松备份和恢复，即使它们存储在单独的 etcd 数据库中也是如此。

此外，Velero 使您能够使用存储平台的本机快照功能或称为 Restic 的集成文件级备份工具来备份和恢复应用程序的持久数据及其配置。

Velero 的工作原理

参考资料 Velero Kubernetes 備份，災難復原之 Velero 篇使用

2023-07-21 08:00

kubernetes backups velero migration storage disaster-recovery

2023年7月18日星期二

在 MacBook Pro M2 Max 上测试 ChatGLM2-6B

ChatGLM2-6B

ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本，在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上，ChatGLM2-6B 引入了如下新特性：

更强大的性能：基于 ChatGLM 初代模型的开发经验，我们全面升级了 ChatGLM2-6B 的基座模型。ChatGLM2-6B 使用了 GLM 的混合目标函数，经过了 1.4T 中英标识符的预训练与人类偏好对齐训练，评测结果显示，相比于初代模型，ChatGLM2-6B 在 MMLU（+23%）、CEval（+33%）、GSM8K（+571%）、BBH（+60%）等数据集上的性能取得了大幅度的提升，在同尺寸开源模型中具有较强的竞争力。
更长的上下文：基于 FlashAttention 技术，我们将基座模型的上下文长度（Context Length）由 ChatGLM-6B 的 2K 扩展到了 32K，并在对话阶段使用 8K 的上下文长度训练，允许更多轮次的对话。但当前版本的 ChatGLM2-6B 对单轮超长文档的理解能力有限，我们会在后续迭代升级中着重进行优化。
更高效的推理：基于 Multi-Query Attention 技术，ChatGLM2-6B 有更高效的推理速度和更低的显存占用：在官方的模型实现下，推理速度相比初代提升了 42%，INT4 量化下，6G 显存支持的对话长度由 1K 提升到了 8K。

2023-07-18 08:00

chatglm glm macos macbookpro apple hugging-face transformers pytorch apple-silicon quantization

2023年7月13日星期四

在 Kubernetes 上部署 MySQL

部署单实例 MySQL

创建 PVC（NFS）

mysql-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

创建 Deployment mysql-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: mysql spec: selector: matchLabels: app: mysql strategy: type: Recreate template: metadata: labels: app: mysql spec: containers: - image: mysql:5.

2023-07-13 08:00

kubernetes mysql statefulset deployment service nfs storage persistent-storage databases

2023年7月11日星期二

使用 Prometheus Operator 在 Kubernetes 上部署 Prometheus 和 Grafana

监控组件

Prometheus

Prometheus 是一个开源系统监控和警报工具包。

架构图

Grafana

Grafana 用于对收集并存储在 Prometheus 数据库中的指标进行分析和交互式可视化。您可以以 Prometheus 作为数据源，为 Kubernetes 集群创建自定义图表、图形和警报。

Prometheus Operator

概述

Prometheus Operator 提供 Prometheus 及相关监控组件的 Kubernetes 原生部署和管理。该项目的目的是简化和自动化 Kubernetes 集群基于 Prometheus 的监控堆栈的配置。

架构图

部署 Prometheus 和 Grafana Monitoring Stack

克隆 kube-prometheus 项目

git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus/

创建 monitoring namespace, CustomResourceDefinitions 和 operator pod

创建 namespace 和 CustomResourceDefinitions

2023-07-11 08:00

prometheus grafana kubernetes monitoring observability visualization dashboard alertmanager prometheus-operator

2023年7月6日星期四

Elastic Cloud on Kubernetes 快速入门

在 Kubernetes 集群中部署 ECK（Elastic Cloud Kubernetes）

安装自定义资源定义（Custom Resource Definition, CRD）

2023-07-06 08:00

elasticsearch kibana filebeat kubernetes operator observability logging monitoring tutorial

2023年7月3日星期一

使用 StorageClass 动态创建 NFS 持久卷

PVC 操作流程

Volume

卷的核心是一个目录，其中可能存有数据，Pod 中的容器可以访问该目录中的数据。所采用的特定的卷类型将决定该目录如何形成的、使用何种介质保存数据以及目录中存放的内容。

使用卷时, 在 .spec.volumes 字段中设置为 Pod 提供的卷，并在 .spec.containers[*].volumeMounts 字段中声明卷在容器中的挂载位置。

emptyDir 卷的存储介质（例如磁盘、SSD 等）是由保存 kubelet 数据的根目录（通常是 /var/lib/kubelet）的文件系统的介质确定。 Kubernetes 对 emptyDir 卷或者 hostPath 卷可以消耗的空间没有限制，容器之间或 Pod 之间也没有隔离。

PersistentVolume

持久卷（PersistentVolume，PV）是集群中的一块存储，可以由管理员事先创建，或者使用存储类（Storage Class）来动态创建。持久卷是集群资源，就像节点也是集群资源一样。PV 持久卷和普通的 Volume 一样，也是使用卷插件来实现的，只是它们拥有独立于任何使用 PV 的 Pod 的生命周期。

PV 对象是由运维人员事先创建在 Kubernetes 集群里待用的。

PersistentVolumeClaim 持久卷声明（PersistentVolumeClaim，PVC

2023-07-03 08:00

kubernetes nfs storageclass persistentvolume persistentvolumeclaim mongodb storage ubuntu persistent-storage

2023年6月14日星期三

How Diffusion Models Work

扩散模型如何工作

Intuition（直觉）

Making images useful to a neural network（使图像对神经网络有用）

噪声处理：添加不同的噪声级别到训练数据中。

灵感来源于物理学，你可以想像一滴墨水滴入一杯水中，最初你确切地知道它落在哪里，但随着时间的推移，你看到到扩散到水中，直到消失。

神经网络真正应该思考的是在每个噪声级别，当你逐渐向图像添加噪声时：

Bob the Sprite!: 如果是 Bob Sprite，你想让神经网络说那是 Bob Sprite，让 Bob 保持原样。
Probable Bob: 如果可能是 Bob Sprite，你可能想让神经网络说你知道这里有些噪声，建议可能填写的详细信息，让它看起来就像 Bob Sprite。
Well, Bob or Fred...: 如果它只是精灵的轮廓，你想建议可能的精灵的一般细节。
No Idea: 如果看起来什么也不知道，建议提出什么是轮廓，让它看起来更像精灵。

Training a neural network to make sprites（训练神经网络制作精灵）

神经网络学习不同的噪声图像并将它们变回精灵。

它学会消除您添加的噪声。

"No Idea" 的噪声级别很重要，因为它是正态分布的，每一个像素的采样都来自于正态分布。

2023-06-14 08:00

diffusion-models ddpm stable-diffusion generative-ai deep-learning machine-learning computer-vision pytorch text-to-image

2023年6月13日星期二

Building Systems with the ChatGPT API

使用 ChatGPT API 构建系统

Language Models, the Chat Format and Tokens（语言模型、聊天格式和 Tokens）

Load OpenAI API key

import os
import openai
import tiktoken
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

2023-06-13 08:00

chatgpt openai api deeplearning-ai python llm prompt-engineering machine-learning generative-ai moderation

2023年6月9日星期五

LangChain for LLM Application Development

LangChain 是用于构建 LLM 应用程序的开源框架

LLM 应用程序开发的 LangChain

LangChain: Models, Prompts and Output Parsers

安装依赖包

pip install python-dotenv
pip install openai

ChatCompletion import os import openai from dotenv import load_dotenv, find_dotenv _ = load_dotenv(find_dotenv()) # read local .env file openai.api_key = os.environ['OPENAI_API_KEY'] def get_completion(prompt, model="gpt-3.5-turbo"): messages = [{"role": "user", "content": prompt}] response = openai.ChatCompletion.

2023-06-09 08:00

langchain llm chatgpt openai deeplearning-ai python prompt-engineering tools api generative-ai

2023年5月30日星期二

State of GPT - Andrej Karpathy

介绍

Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.

了解 ChatGPT 等 GPT 助手的训练管道，从标记化到预训练、监督微调和人类反馈强化学习 (RLHF)。深入研究有效使用这些模型的实用技术和心智模型，包括提示策略、微调、快速增长的工具生态系统及其未来的扩展。

2023-05-30 08:00

llm gpt fine-tuning andrej-karpathy tokenization machine-learning deep-learning generative-ai ai chatgpt

2023年5月29日星期一

OpenAI Fine Tuning

参考资料

2023-05-29 08:00

fine-tuning openai llm machine-learning python api training-data datasets

2023年5月28日星期日

ChatGPT Prompt Engineering for Developers

ChatGPT Prompt Engineering for Developers 由Isa Fulford（OpenAI）和Andrew Ng（DeepLearning.AI）教授的课程将描述 LLM 的工作原理，提供快速工程的最佳实践，并展示 LLM API 如何用于各种任务的应用程序。

面向开发人员的 ChatGPT 提示工程

Instroduction（介绍）

Guidelines（准则）

帮助函数 import openai import os openai.api_key = os.getenv('OPENAI_API_KEY') def get_completion(prompt, model="gpt-3.5-turbo"): messages = [{"role": "user", "content": prompt}] response = openai.ChatCompletion.create( model=model, messages=messages, temperature=0, # this is the degree of randomness of the model's output ) return response.choices[0].

2023-05-28 08:00

chatgpt prompt-engineering openai deeplearning-ai python api llm generative-ai ai

2023年5月26日星期五

Whisper 语音识别

Whisper

功能

将音频转录成音频所使用的任何语言。
将音频翻译并转录成英文。

文件上传目前限制为 25 MB，并且支持以下输入文件类型：mp3, mp4, mpeg, mpga, m4a, wav, webm.

语音内容

Mira Murati 是一位对人工智能技术充满热情的科技领袖，她的理念和影响对人工智能技术的发展和应用产生了深远的影响。

她认为人工智能技术应该是以人为本的，强调人工智能技术应该是一种能够服务于人类的工具，而不是取代人类的工具。

她指出，人工智能技术的最终目的是为人类服务，因此人工智能技术应该以人类的利益和需求为中心，以解决人类面临的实际问题。人工智能技术的应用需要深入了解人类社会的需要和价值，将其应用到真正有意义的领域中。

OpenAI Whisper

安装 OpenAI

!pip install -U openai

测试

语音识别

import openai
audio_file= open("data/audios/test.m4a", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript["text"])

Miramurati是一位对人工智能技术充满热情的科技领袖他的

2023-05-26 08:00

whisper openai audio python speech-to-text pydub transcription

2023年5月22日星期一

Stable Diffusion

模型 Stable Diffusion v1-5

数据集 LAION-5B

一个由 58.5 亿个 CLIP 过滤的图像-文本对组成的数据集

CLIP Retrieval

工作原理是将文本查询转换为 CLIP 嵌入，然后使用该嵌入来查询剪辑图像嵌入的 knn 索引，在搜索演示中搜索数据集。

Stable Diffusion GUI

Diffusion Bee - Stable Diffusion GUI App for MacOS

Diffusers

🤗 Diffusers 是最先进的预训练扩散模型的首选库，用于生成图像、音频，甚至分子的 3D 结构。无论您是在寻找简单的推理解决方案，还是想训练自己的扩散模型，🤗 Diffusers 都是一个支持两者的模块化工具箱。

知识扩展

Latent Representation

数据中的潜在特征表示，这些特征可能不易直接观察到，但对于模型的学习和预测等任务具有重要意义。例如，在图像识别中，一张图片的颜色、形状、纹理等特征可以被视为潜在特征表示。

Latent Space

由模型自动生成的潜在特征空间，其中每个点都表示一种可能的特征组合。在深度学习和人工智能领域，常常使用自编码器等技术来学习并探索数据的潜在特征空间，以期获得更深入的理解和更好的应用效果。

参考资料 AutoFaiss Multimodal search

2023-05-22 08:00

stable-diffusion clip laion diffusers generative-ai computer-vision text-to-image hugging-face

2023年5月21日星期日

Docker 构建多平台镜像

多平台构建器

当前构建器实例是驱动程序 docker-container，可以同时指定多个平台。在这种情况下，它会构建一个清单列表，其中包含所有指定架构的镜像。在构建的时候可以并行构建多个架构的镜像。

docker run 当您在使用此镜像时 docker service，Docker 会根据节点的平台选择正确的镜像。

有个缺点：必须发布到 Docker Hub 或者私有仓库，因为 Docker 不支持多架构的本地镜像。

查看构建器 docker buildx ls NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS default docker default default running v0.11.6+616c3f613b54 linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64, linux/arm/v7, linux/arm/v6, linux/amd64, linux/amd64/v2 desktop-linux * docker desktop-linux desktop-linux running v0.11.

2023-05-21 08:00

docker buildx multi-platform qemu arm64 amd64 containers

2023年5月18日星期四

macOS Docker

今天用 Docker 构建镜像，突然就挂了。重启 Docker，发现 Docker 无法启动了。

出现的错误

🐳 Building platen-switch:arm64
[+] Building 0.0s (2/2) FINISHED                                                                                                                                                            
 => [internal] load build definition from Dockerfile                                                                                                                                   0.0s
 => => transferring dockerfile: 69B                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                        0.0s
ERROR: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to create temp dir: mkdir /var/lib/docker/tmp/buildkit-mount1477620899: no space left on device

分析问题运行诊断工具 com.docker.

2023-05-18 08:00

macos docker troubleshooting virtualization containers storage docker-desktop mirror

2023年5月16日星期二

Ultralytics YOLOv8 推理速度对比

CPU

服务器信息

lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          40
On-line CPU(s) list:             0-39
Thread(s) per core:              2
Core(s) per socket:              10
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping:                        1
CPU MHz:                         1201.687
CPU max MHz:                     3400.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4788.86
Virtualization:                  VT-x
L1d cache:                       640 KiB
L1i cache:                       640 KiB
L2 cache:                        5 MiB
L3 cache:                        50 MiB
NUMA node0 CPU(s):               0-9,20-29
NUMA node1 CPU(s):               10-19,30-39
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nop
                                 l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c 
                                 rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 sme
                                 p bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

2023-05-16 08:00

yolo yolov8 pytorch onnx inference benchmarks computer-vision object-detection cpu gpu

2023年5月15日星期一

macOS 下的 __MACOSX 目录和 .DS_Store 文件

.DS_Store 文件

.DS_Store 是 Desktop Services Store 的缩写。

.DS_Store 是 macOS 操作系统中隐藏的文件，它存储有关文件夹的元数据，例如文件夹中的文件位置、文件夹的显示选项和自定义图标等信息。这些元数据是用来帮助操作系统更快地显示文件夹中的内容，并记住用户的偏好设置。这些文件只是本地的，不会在网络文件共享时传输，因此不会影响其他操作系统用户。

__MACOSX 目录

当你在 Mac 电脑上创建一个压缩文件时，系统会自动在压缩文件中添加一个名为 __MACOSX 的目录。该目录包含了 Mac 操作系统专有的一些文件，如 .DS_Store 等。这些文件不会对压缩文件的解压缩造成影响，但它们可能会在其他操作系统上解压缩时出现问题，例如在 Windows 上解压缩时可能会显示 __MACOSX 目录或 .DS_Store 文件。为了避免这种情况，你可以在创建压缩文件时选择不包含 Mac 专有文件，或者在解压缩时手动删除 __MACOSX 目录和 .DS_Store 文件。

删除 __MACOSX 目录和 .DS_Store 文件方法一 find . -name '__MACOSX' -exec rm -rf {} ; -o -name '.

2023-05-15 08:00

macos find shell metadata compression zip archive ds-store

2023年8月8日 星期二

2023年7月24日 星期一

2023年7月21日 星期五

2023年7月18日 星期二

2023年7月13日 星期四

2023年7月11日 星期二

2023年7月6日 星期四

2023年7月3日 星期一

2023年6月14日 星期三

2023年6月13日 星期二

2023年6月9日 星期五

2023年5月30日 星期二

2023年5月29日 星期一

2023年5月28日 星期日

2023年5月26日 星期五

2023年5月22日 星期一

2023年5月21日 星期日

2023年5月18日 星期四

2023年5月16日 星期二

2023年5月15日 星期一