Kubernetes 自动清理 Failed/Successed Pod
背景
Pod 部署应为某些原因,Pod 部署失败Failed 或者 部署完成Successed等历史pod 不会被删除,导致 kubectl get pod -A
大量异常pod 影响整个系统的稳定性。
例如:
[root@master-xxxx~]# kubectl get pod -A | grep -v "Running"
NAMESPACE NAME READY STATUS RESTARTS AGE
5824-xxxx-xx-sy-jxq aaa-life-service-77f5dd5c58-qqjmh 0/1 CrashLoopBackOff 2016 (2m12s ago) 6d23h
xxxx-xx server-6c65b7ffb4-62mqv 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-9rhhh 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-ct6fd 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-l4chk 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-p8fmr 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-pn29g 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-qd4wz 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-qvrbq 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-w5wm4 0/1 SMTAlignmentError 0 3h46m
xxxx-xx server-6c65b7ffb4-zjzp7 0/1 SMTAlignmentError 0 3h46m
原因分析
垃圾收集(PodGC
)对于确保足够的资源可用性并避免性能和可用性下降非常重要。在最坏
的情况下,系统可能会崩溃或长时间无法使用
。垃圾收集的当前设置为 12,500
个终止的 Pod,这可能太高了,您的系统无法承受。
kubernetes kube-controller-manager 默认实现
// RecommendedDefaultPodGCControllerConfiguration defaults a pointer to a
// PodGCControllerConfiguration struct. This will set the recommended default
// values, but they may be subject to change between API versions. This function
// is intentionally not registered in the scheme as a "normal" `SetDefaults_Foo`
// function to allow consumers of this type to set whatever defaults for their
// embedded configs. Forcing consumers to use these defaults would be problematic
// as defaulting in the scheme is done as part of the conversion, and there would
// be no easy way to opt-out. Instead, if you want to use this defaulting method
// run it in your wrapper struct of this type in its `SetDefaults_` method.
func RecommendedDefaultPodGCControllerConfiguration(obj *kubectrlmgrconfigv1alpha1.PodGCControllerConfiguration) {
if obj.TerminatedPodGCThreshold == 0 {
obj.TerminatedPodGCThreshold = 12500
}
}
解决方法
方法一:
在主节点上编辑控制器管理器 pod 规范文件/etc/kubernetes/manifests/kube-controller-manager.yaml
,并将--terminated-pod-gc-threshold
设置为适当的阈值,例如:
- --terminated-pod-gc-threshold=10
更彻底方法: CronJob
每天定时,自动删除所有处于失败状态超过24小时的 pod
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cleaner
rules:
- apiGroups: [""]
resources: ["pods","namespaces"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cleaner
subjects:
- kind: ServiceAccount
name: cleaner
namespace: default
roleRef:
kind: ClusterRole
name: cleaner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cleaner
namespace: kube-system
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleaner
namespace: kube-system
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: cleaner
containers:
- name: cleaner
image: portainer/kubectl-shell:latest
imagePullPolicy: IfNotPresent
command:
- /bin/bash
- -c
- /tmp/cleanup.sh
volumeMounts:
- name: scriptconfig
subPath: cleanup.sh
mountPath: /tmp/cleanup.sh
restartPolicy: OnFailure
volumes:
- name: scriptconfig
configMap:
name: cleaner-config
defaultMode: 0777
---
## Instantly create a cronjob
# kubectl create job --from=cronjob/cleaner cleaner-one -n kube-system
apiVersion: v1
kind: ConfigMap
metadata:
namespace: kube-system
name: cleaner-config
data:
cleanup.sh: |
#!/bin/bash
function GetPods() {
shift
local arr=("$@")
if [[ $arr == "" ]]
then
echo "No Pod Scheduled for Deletion"
else
for i in ${!arr[@]}; do
echo -e "\nNo: $i Pod: ${arr[$i]} Namespace: $NAMESPACE"
kubectl delete pods "${arr[$i]}" --force --grace-period=0 -n $NAMESPACE || echo "Unable to Delete."
done
fi
}
export NAMESPACE=''
export a=$(kubectl get namespace | awk '{print $1}' | tail -n +2)
for i in $a; do echo "Namespace: $i" && export NAMESPACE=$i && podNames=$(kubectl get pods -n $NAMESPACE --sort-by=.metadata.creationTimestamp --field-selector status.phase=Failed -o go-template --template '{{range .items}}{{.metadata.name}} {{.metadata.creationTimestamp}}{{"\n"}}{{end}}' | awk '$2 <= "'$(date -Ins --utc | sed 's/+0000/Z/')'" { print $1 }') && podNames=($podNames) && echo $podNames && GetPods "${podNames[@]}" && echo -e '\n\n' && podNames='';done
其他
「如果这篇文章对你有用,请随意打赏」
如果这篇文章对你有用,请随意打赏
使用微信扫描二维码完成支付