目录

在 Kubernetes 集群中部署 ECK(Elastic Cloud Kubernetes)

安装自定义资源定义(Custom Resource Definition, CRD)

kubectl create -f https://download.elastic.co/downloads/eck/2.8.0/crds.yaml
customresourcedefinition.apiextensions.k8s.io/agents.agent.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticmapsservers.maps.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearchautoscalers.autoscaling.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/logstashes.logstash.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/stackconfigpolicies.stackconfigpolicy.k8s.elastic.co created

查看安装的自定义资源

kubectl get crd
NAME                                                   CREATED AT
agents.agent.k8s.elastic.co                            2023-07-06T04:00:51Z
apmservers.apm.k8s.elastic.co                          2023-07-06T04:00:51Z
beats.beat.k8s.elastic.co                              2023-07-06T04:00:51Z
elasticmapsservers.maps.k8s.elastic.co                 2023-07-06T04:00:51Z
elasticsearchautoscalers.autoscaling.k8s.elastic.co    2023-07-06T04:00:51Z
elasticsearches.elasticsearch.k8s.elastic.co           2023-07-06T04:00:51Z
enterprisesearches.enterprisesearch.k8s.elastic.co     2023-07-06T04:00:51Z
kibanas.kibana.k8s.elastic.co                          2023-07-06T04:00:51Z
logstashes.logstash.k8s.elastic.co                     2023-07-06T04:00:51Z
stackconfigpolicies.stackconfigpolicy.k8s.elastic.co   2023-07-06T04:00:51Z

安装 Operator 及其 RBAC 规则

kubectl apply -f https://download.elastic.co/downloads/eck/2.8.0/operator.yaml
namespace/elastic-system created
serviceaccount/elastic-operator created
secret/elastic-webhook-server-cert created
configmap/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
service/elastic-webhook-server created
statefulset.apps/elastic-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created

监控 Operator 日志

kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

部署 Elasticsearch 集群

创建 Elasticsearch 集群

创建一个简单的 Elasticsearch 集群,使用一个节点。

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 8.8.2
  nodeSets:
  - name: default
    count: 1
    config:
      node.store.allow_mmap: false
EOF
elasticsearch.elasticsearch.k8s.elastic.co/quickstart created

监控集群健康状况和创建进度

获取 Kubernetes 集群中当前 Elasticsearch 集群的概览,包括运行状况、版本和节点数量。

kubectl get elasticsearch
NAME         HEALTH    NODES   VERSION   PHASE             AGE
quickstart   unknown           8.8.2     ApplyingChanges   2m22s

发现健康的状态是 unknown,它来自 Elasticsearch 的集群健康 API。通过查看事件,找到 PVC 没有设置 Storage Class。

kubectl events
LAST SEEN               TYPE      REASON                  OBJECT                                                             MESSAGE
2m38s                   Normal    SuccessfulCreate        StatefulSet/quickstart-es-default                                  create Claim elasticsearch-data-quickstart-es-default-0 Pod quickstart-es-default-0 in StatefulSet quickstart-es-default success
2m38s                   Normal    SuccessfulCreate        StatefulSet/quickstart-es-default                                  create Pod quickstart-es-default-0 in StatefulSet quickstart-es-default successful
2m36s (x2 over 2m38s)   Warning   FailedScheduling        Pod/quickstart-es-default-0                                        0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
6s (x13 over 2m38s)     Normal    FailedBinding           PersistentVolumeClaim/elasticsearch-data-quickstart-es-default-0   no persistent volumes available for this claim and no storage class is set

编辑 PVC 对象,在 spec 中设置 storageClassName: nfs-client

kubectl edit pvc elasticsearch-data-quickstart-es-default-0
spec:
  storageClassName: nfs-client

需要等几分钟的时间,查看 Elasticsearch 集群的概览。

kubectl get elasticsearch
NAME         HEALTH    NODES   VERSION   PHASE             AGE
quickstart   unknown           8.8.2     ApplyingChanges   7m44s

还没有成功。通过查看事件,出现错误:Readiness probe failed: {"timestamp": "2023-07-06T04:20:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}

kubectl events
LAST SEEN                TYPE      REASON                  OBJECT                                                             MESSAGE
9m27s                    Normal    SuccessfulCreate        StatefulSet/quickstart-es-default                                  create Pod quickstart-es-default-0 in StatefulSet quickstart-es-default successful
9m27s                    Normal    SuccessfulCreate        StatefulSet/quickstart-es-default                                  create Claim elasticsearch-data-quickstart-es-default-0 Pod quickstart-es-default-0 in StatefulSet quickstart-es-default success
4m25s (x23 over 9m27s)   Normal    FailedBinding           PersistentVolumeClaim/elasticsearch-data-quickstart-es-default-0   no persistent volumes available for this claim and no storage class is set
4m13s                    Normal    ProvisioningSucceeded   PersistentVolumeClaim/elasticsearch-data-quickstart-es-default-0   Successfully provisioned volume pvc-c3408e99-c65b-49aa-b39f-5ed3285d5815
4m13s (x4 over 9m27s)    Warning   FailedScheduling        Pod/quickstart-es-default-0                                        0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
4m13s                    Normal    ExternalProvisioning    PersistentVolumeClaim/elasticsearch-data-quickstart-es-default-0   waiting for a volume to be created, either by external provisioner "k8s-sigs.io/nfs-subdir-external-provisioner" or manually created by system administrator
4m13s                    Normal    Provisioning            PersistentVolumeClaim/elasticsearch-data-quickstart-es-default-0   External provisioner is provisioning volume for claim "default/elasticsearch-data-quickstart-es-default-0"
4m4s                     Normal    Scheduled               Pod/quickstart-es-default-0                                        Successfully assigned default/quickstart-es-default-0 to cpu2
4m3s                     Normal    Pulling                 Pod/quickstart-es-default-0                                        Pulling image "docker.elastic.co/elasticsearch/elasticsearch:8.8.2"
112s                     Normal    Pulled                  Pod/quickstart-es-default-0                                        Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:8.8.2" in 2m11.360542724s (2m11.360557388s including waiting)
112s                     Normal    Started                 Pod/quickstart-es-default-0                                        Started container elastic-internal-init-filesystem
112s                     Normal    Created                 Pod/quickstart-es-default-0                                        Created container elastic-internal-init-filesystem
109s                     Normal    Created                 Pod/quickstart-es-default-0                                        Created container elastic-internal-suspend
109s                     Normal    Pulled                  Pod/quickstart-es-default-0                                        Container image "docker.elastic.co/elasticsearch/elasticsearch:8.8.2" already present on machine
108s                     Normal    Created                 Pod/quickstart-es-default-0                                        Created container elasticsearch
108s                     Normal    Pulled                  Pod/quickstart-es-default-0                                        Container image "docker.elastic.co/elasticsearch/elasticsearch:8.8.2" already present on machine
108s                     Normal    Started                 Pod/quickstart-es-default-0                                        Started container elastic-internal-suspend
107s                     Normal    Started                 Pod/quickstart-es-default-0                                        Started container elasticsearch
93s                      Warning   Unhealthy               Pod/quickstart-es-default-0                                        Readiness probe failed: {"timestamp": "2023-07-06T04:20:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}

通过 Elasticsearch pod readiness probe fails with “message”: “readiness probe failed”, “curl rc”: “7” 了解到可以是时间同步出了问题。我检查了一下集群服务器并没有设置时间同步。

设置时间同步

需要在集群里的每一台服务器都进行时间同步的设置。

  • 安装 ntp
    sudo apt update
    sudo apt install ntp -y
    
  • 配置 NTP 服务器
    sudo vim /etc/ntp.conf
    server pool.ntp.org
    
  • 启动 NTP 服务
    sudo systemctl start ntp     # Systemd 系统
    sudo service ntp start       # SysV 系统
    
  • 设置 NTP 服务开机自启动 确保 NTP 服务在机器重启后自动启动。
sudo systemctl enable ntp    # Systemd 系统
sudo chkconfig ntp on        # SysV 系统
  • 等待 NTP 客户端与配置的 NTP 服务器同步时间

通过查看 ntp 服务的状态,发现了错误:kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized

sudo systemctl status ntp
● ntp.service - Network Time Service
     Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-07-06 12:33:15 CST; 9min ago
       Docs: man:ntpd(8)
   Main PID: 899729 (ntpd)
      Tasks: 2 (limit: 309287)
     Memory: 2.2M
     CGroup: /system.slice/ntp.service
             └─899729 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 116:119

Jul 06 12:33:19 cpu1 ntpd[899729]: Soliciting pool server 119.28.206.193
Jul 06 12:33:20 cpu1 ntpd[899729]: Soliciting pool server 162.159.200.123
Jul 06 12:33:20 cpu1 ntpd[899729]: Soliciting pool server 202.112.29.82
Jul 06 12:33:20 cpu1 ntpd[899729]: Soliciting pool server 185.125.190.58
Jul 06 12:33:21 cpu1 ntpd[899729]: Soliciting pool server 91.189.94.4
Jul 06 12:33:22 cpu1 ntpd[899729]: Soliciting pool server 185.125.190.57
Jul 06 12:33:23 cpu1 ntpd[899729]: Soliciting pool server 185.125.190.56
Jul 06 12:33:24 cpu1 ntpd[899729]: Soliciting pool server 91.189.91.157
Jul 06 12:33:25 cpu1 ntpd[899729]: Soliciting pool server 2620:2d:4000:1::40
Jul 06 12:39:02 cpu1 ntpd[899729]: kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized

需要使用手动同步时钟来解决。

sudo hwclock --hctosys     # 从硬件时钟同步到系统时钟
sudo hwclock --systohc     # 从系统时钟同步到硬件时钟

查看 Elasticsearch 集群的概览。

kubectl get elasticsearch
NAME         HEALTH   NODES   VERSION   PHASE   AGE
quickstart   green    1       8.8.2     Ready   36m

终于成功了。

查看 Elasticsearch 集群中的 Pod。

kubectl get pods --selector='elasticsearch.k8s.elastic.co/cluster-name=quickstart'
NAME                      READY   STATUS    RESTARTS   AGE
quickstart-es-default-0   1/1     Running   0          144m

查看 Pod 的日志

kubectl logs -f quickstart-es-default-0

请求 Elasticsearch 访问权限

系统会自动为您的集群创建 ClusterIP 服务

kubectl get service quickstart-es-http
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
quickstart-es-http   ClusterIP   10.97.81.206   <none>        9200/TCP   155m

获取凭据(credentials)。

系统会自动创建一个名为 elastic 的默认用户,其密码存储在 Kubernetes secret。

PASSWORD=$(kubectl get secret quickstart-es-elastic-user -o go-template='')

请求 Elasticsearch endpoint。

方法一:从 Kubernetes 集群内部:

kubectl exec -it quickstart-es-default-0 -- curl -u "elastic:$PASSWORD" -k "https://quickstart-es-http:9200"
{
  "name" : "quickstart-es-default-0",
  "cluster_name" : "quickstart",
  "cluster_uuid" : "4_JCkqNDRxiHpWcEQq05DQ",
  "version" : {
    "number" : "8.8.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "98e1271edf932a480e4262a471281f1ee295ce6b",
    "build_date" : "2023-06-26T05:16:16.196344851Z",
    "build_snapshot" : false,
    "lucene_version" : "9.6.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

方法二:通过 kubectl port-forward,在本地访问。

kubectl port-forward service/quickstart-es-http 9200
curl -u "elastic:$PASSWORD" -k "https://localhost:9200"

删除 Elasticsearch 集群

kubectl delete Elasticsearch quickstart 

部署 Kibana 实例

创建一个 Kibana 实例并将其与您的 Elasticsearch 集群关联

cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: quickstart
spec:
  version: 8.8.2
  count: 1
  config:
    i18n.locale: zh-CN
  elasticsearchRef:
    name: quickstart
EOF
kibana.kibana.k8s.elastic.co/quickstart created

监控 Kibana 健康状况

kubectl get kibana
NAME         HEALTH   NODES   VERSION   AGE
quickstart   green    1       8.8.2     2m18s

关联的 Pod

kubectl get pod --selector='kibana.k8s.elastic.co/name=quickstart'
NAME                             READY   STATUS    RESTARTS   AGE
quickstart-kb-847cb7879d-8pgcv   1/1     Running   0          3m6s

访问 Kibana

会自动为 Kibana 创建 ClusterIP 服务。

kubectl get service quickstart-kb-http
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
quickstart-kb-http   ClusterIP   10.101.36.136   <none>        5601/TCP   5m20s

通过 kubectl port-forward,在本地访问。

kubectl port-forward service/quickstart-kb-http 5601

在浏览器中使用 https://localhost:5601 进行访问。

使用 elastic 用户和通过下面的命令获得的密码登录。

kubectl get secret quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode; echo

部署 Filebeat 实例

cat <<EOF | kubectl apply -f -
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: quickstart
spec:
  type: filebeat
  version: 8.8.2
  elasticsearchRef:
    name: quickstart
  kibanaRef:
    name: quickstart
  config:
    # 排障时打开
    # logging.level: debug 
    filebeat.inputs:
    - type: container
      paths:
      - /var/log/containers/*.log
  daemonSet:
    podTemplate:
      spec:
        # 影响Pod中的/etc/resolv.conf 
        # DNS 请求会在优先在集群域查询,即使 hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
        - name: filebeat
          resources:
            requests:
              memory:
              cpu:
            limits:
              memory:
              cpu:
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
EOF

查看部署的 Filebeat 实例

kubectl get pods -o wide --selector='beat.k8s.elastic.co/name=quickstart'
NAME                             READY   STATUS    RESTARTS      AGE   IP              NODE   NOMINATED NODE   READINESS GATES
quickstart-beat-filebeat-56z5s   1/1     Running   5 (31m ago)   32m   172.16.33.159   cpu3   <none>           <none>
quickstart-beat-filebeat-wqtq8   1/1     Running   4 (31m ago)   32m   172.16.33.158   cpu2   <none>           <none>

创建一个应用,每秒产生一条日志。

apiVersion: v1
kind: Pod
metadata:
  name: time-pod
spec:
  containers:
  - name: time-container
    image: python:3
    command: ["python"]
    args: ["-u", "-c", "exec(\"import time\\nwhile True:\\n  print('[custom_field_name] Current time:', time.strftime('%Y-%m-%d %H:%M:%S'))\\n  time.sleep(1)\")"]

在浏览器中使用 https://localhost:5601 进行日志浏览和策略设置。

升级您的部署

您可以添加和修改原始集群规范的大多数元素,前提是它们转换为底层 Kubernetes 资源的有效转换(例如,现有卷声明无法缩小)。 操作员将尝试应用您的更改,同时将对现有集群的干扰降至最低。 您应该确保 Kubernetes 集群有足够的资源来适应更改(额外的存储空间、足够的内存和 CPU 资源来临时启动新的 pod 等)。

例如,您可以将集群扩展到三个 Elasticsearch 节点:

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 8.8.2
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
EOF

查看 Elasticsearch CRD 规范

kubectl describe crd elasticsearch

参考资料