Kubernetes Etcd 启动失败问题排查

etcd

etcd 一直重启。查看本地 db size etcd DB Size

etcd 默认调整的 DB Size 2G, 并且收到boltdb压缩周期和模式影响

处理步骤

碎片整理

$ etcdctl --cacert xxx --key xxx --endpoints=https://127.0.0.1:2379 defrag

Finished defragmenting etcd member[127.0.0.1:2379]

修改etcd启动参数

$ cat /apps/conf/kubernetes/manifest/etcd.yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://xxx.xxx.xxx.xxx:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://xxx.xxx.xxx.xxx:2379
    - --auto-compaction-retention=1000000 #自动压缩revision数，保留最新100w个revision， 保留多少个按集群resource数*5
    - --auto-compaction-mode=revision #自动压缩模式revision，默认将last revision之前版本都压缩
    - --quota-backend-bytes=8589934592 # DB size调整为8GB
...

重启 etcd

「如果这篇文章对你有用,请随意打赏」

FEATURED TAGS

agent apiserver application bandwidth-limit cgo cgroupfs ci/cd client-go cloudnative cncf cni community container container-network-interface containerd controller coredns crd cuda custom-controller deployment device-plugin docker docker-build docker-image drop ebpf ecology egress etcd gitee github gitlab golang governance gpu gpu-device hpa http2 image ingress iptables jobs kata kata-runtime kernel kind kubelet kubenetes kubernetes library linux-os logging loki metrics monitor namespace network network-troubleshooting node nodeport nvidai ollama operator pingmesh pod prestop prometheus proxyless pvc rollingupdate schedule scheduler serverless sglang sidecar sigtrem systemd tensorrt-llm throttling timeout tools traceroute vllm

TIPS之 Kubernetes Etcd 启动失败问题排查