TIPS之 Controller Manager拥塞,导致的绑定pvc的pod过慢

Controller Manager拥塞,导致的绑定pvc的pod过慢

Posted by 董江 on Monday, July 17, 2023

1. 现象

线上k8s集群报警,部署带有PV/PVC的pod,pod启动完成时间长达4min+. 包括 创建PVC、创建PVPVC变更bound状态时间等

2. 排查思路

集群规模信息查看:

apps@WXJD-PSC-xxxx-xxx ~$ kubectl get pv | wc -l
2688

apps@WXJD-PSC-xxxx-xxx ~$ kubectl get PVC -A | wc -l
1171

apps@WXJD-PSC-xxxx-xxx ~$ kubectl get node | wc -l
217

整个集群中216node, 2687pv1170pvc

2.1 底层csi返回异常或者重试导致

底层使用 rbd模式ceph-csi卷, 返回结果延期。 但是冲csi plugin 和csi provisioner中观察日志,其中无明显错误

I0704 08:28:57.895407       1 controller.go:1471] delete "pvc-9f24361e-9d18-4025-96a4-b6c5c07b85b1": started
I0704 08:28:57.951941       1 connection.go:183] GRPC call: /csi.v1.Controller/DeleteVolume
I0704 08:28:57.951953       1 connection.go:184] GRPC request: {"secrets":"***stripped***","volume_id":"0001-0024-75e3f33f-c0cb-4053-a8eb-1cc9a5676f87-0000000000000003-8a8dba6e-1813-11ee-b3bf-eaf9d27ef1c9"}
I0704 08:28:57.981481       1 connection.go:186] GRPC response: {}
I0704 08:28:57.981516       1 connection.go:187] GRPC error: rpc error: code = Internal desc = rbd csi-vol-8a8dba6e-1813-11ee-b3bf-eaf9d27ef1c9 is still being used
E0704 08:28:57.981541       1 controller.go:1481] delete "pvc-9f24361e-9d18-4025-96a4-b6c5c07b85b1": volume deletion failed: rpc error: code = Internal desc = rbd csi-vol-8a8dba6e-1813-11ee-b3bf-eaf9d27ef1c9 is still being used
W0704 08:28:57.981569       1 controller.go:989] Retrying syncing volume "pvc-9f24361e-9d18-4025-96a4-b6c5c07b85b1", failure 116
E0704 08:28:57.981590       1 controller.go:1007] error syncing volume "pvc-9f24361e-9d18-4025-96a4-b6c5c07b85b1": rpc error: code = Internal desc = rbd csi-vol-8a8dba6e-1813-11ee-b3bf-eaf9d27ef1c9 is still being used
I0704 08:28:57.981664       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-9f24361e-9d18-4025-96a4-b6c5c07b85b1", UID:"4c72c3b5-e646-4939-b742-569723501a19", APIVersion:"v1", ResourceVersion:"4454875318", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = rbd csi-vol-8a8dba6e-1813-11ee-b3bf-eaf9d27ef1c9 is still being used
I0704 08:28:58.367237       1 leaderelection.go:278] successfully renewed lease kube-system/rbd-csi-ceph-com

其中,无具体 AD Controller绑定异常. 只有一个volume不能被删除, 是另一个问题

2.2 Controller Manager中的pv controller 绑定进入队列Queue等待导致

2023-07-06T14:34:59.939392489+08:00 stderr F I0706 14:34:59.939362       1 pv_controller_base.go:640] storeObjectUpdate updating volume "pvc-e61e0904-a224-4507-9e42-8c15a0d72884" with version 680511035
2023-07-06T14:34:59.939397062+08:00 stderr F I0706 14:34:59.939371       1 pv_controller.go:543] synchronizing PersistentVolume[pvc-e61e0904-a224-4507-9e42-8c15a0d72884]: phase: Released, bound to: "80cd9e90/data-ae8a1da801b-1702e00-0 (uid: e61e0904-a224-4507-9e42-8c15a0d72884)", boundByController: false
2023-07-06T14:34:59.939406204+08:00 stderr F I0706 14:34:59.939380       1 pv_controller.go:578] synchronizing PersistentVolume[pvc-e61e0904-a224-4507-9e42-8c15a0d72884]: volume is bound to claim 80cd9e90/data-ae8a1da801b-1702e00-0
2023-07-06T14:34:59.939408161+08:00 stderr F I0706 14:34:59.939317       1 pv_controller.go:520] synchronizing bound PersistentVolumeClaim[67828ece/data-a72c7567ce3-8477300-0]: claim is already correctly bound
2023-07-06T14:34:59.939414644+08:00 stderr F I0706 14:34:59.939388       1 pv_controller.go:1012] binding volume "data-a72c7567ce3-8477300-0-wxjd-psc-p15f1-spod11-pm-os01-mysql-lk8s-06-5d574457" to claim "67828ece/data-a72c7567ce3-8477300-0"
2023-07-06T14:34:59.939417139+08:00 stderr F I0706 14:34:59.939390       1 pv_controller.go:612] synchronizing PersistentVolume[pvc-e61e0904-a224-4507-9e42-8c15a0d72884]: claim 80cd9e90/data-ae8a1da801b-1702e00-0 not found
2023-07-06T14:34:59.939419053+08:00 stderr F I0706 14:34:59.939394       1 pv_controller.go:910] updating PersistentVolume[data-a72c7567ce3-8477300-0-wxjd-psc-p15f1-spod11-pm-os01-mysql-lk8s-06-5d574457]: binding to "67828ece/data-a72c7567ce3-8477300-0"
2023-07-06T14:34:59.939420961+08:00 stderr F I0706 14:34:59.939395       1 pv_controller.go:1108] reclaimVolume[pvc-e61e0904-a224-4507-9e42-8c15a0d72884]: policy is Delete
2023-07-06T14:34:59.939430994+08:00 stderr F I0706 14:34:59.939402       1 pv_controller.go:1753] scheduleOperation[delete-pvc-e61e0904-a224-4507-9e42-8c15a0d72884[67d5e9c4-57e3-4343-8747-6b7ae83bac5e]]
2023-07-06T14:34:59.93943311+08:00 stderr F I0706 14:34:59.939404       1 pv_controller.go:922] updating PersistentVolume[data-a72c7567ce3-8477300-0-wxjd-psc-p15f1-spod11-pm-os01-mysql-lk8s-06-5d574457]: already bound to "67828ece/data-a72c7567ce3-8477300-0"
2023-07-06T14:34:59.939434888+08:00 stderr F I0706 14:34:59.939406       1 pv_controller.go:1764] operation "delete-pvc-e61e0904-a224-4507-9e42-8c15a0d72884[67d5e9c4-57e3-4343-8747-6b7ae83bac5e]" is already running, skipping

2023-07-06T14:35:01.09077583+08:00 stderr F I0706 14:35:01.087787       1 request.go:600] Waited for 1m1.146385902s due to client-side throttling, not priority and fairness, request: GET:https://[2409:8c85:2021:3000::ae2:cd08]:6443/api/v1/persistentvolumes/pvc-e61e0904-a224-4507-9e42-8c15a0d72884
2023-07-06T14:35:01.121686014+08:00 stderr F I0706 14:35:01.121578       1 request.go:600] Waited for 291.938256ms due to client-side throttling, not priority and fairness, request: PUT:https://[2409:8c85:2021:3000::ae2:cd08]:6443/apis/batch/v1/namespaces/90552e00/cronjobs/a239855acd7-238ff-4e041-clean-h45bc/status
2023-07-06T14:35:01.130820524+08:00 stderr F I0706 14:35:01.130752       1 pv_controller.go:1244] Volume "pvc-e61e0904-a224-4507-9e42-8c15a0d72884" is already being deleted

跟踪8c15a0d72884, 观察到 controller-manager通过client-go对外请求的brust被限流了

基本上是由于brust ratelimit导致, 限流延迟处理了;

3. 根本原因

查看其中cronjob的自定义的pv/pvc 定时查询任务。 业务中有大量的cronjob 对集群中的未被绑定的pv需要删除的pv 做删除处理。

主要是由于cronjob使用问题 正常理想情况

集群中pv和pvc慢慢变多,一次cronjob的执行结束就会变长,会和下一次有重合。 导致brust请求变多

更有甚至,多次cronjob重合,大量积压

4. 解决方案

短期(线上恢复)

controller-manager 默认的

 --concurrent-cron-job-syncs=5 #此配置需要k8s version >= 1.27.x
 --kube-api-qps=20
 --kube-api-burst=30

更改为

 --concurrent-cron-job-syncs=50 #此配置需要k8s version >= 1.27.x
 --kube-api-qps=100
 --kube-api-burst=500

长期

需要将自定义的cronjob服务,变更为service服务

通过 while 确保一次循环完成后,再进行另一次循环check. 避免cornjob服务导致请求挤压

like:


func CheckAll() {
    var wg sync.WaitGroup
    concurrent := make(chan bool, 1)
	for  {
		concurrent <- true
		wg.Add(1)

		go func() {
			// do somethings
			<-concurrent
		}()
	}
	wg.Wait()
}

「如果这篇文章对你有用,请随意打赏」

Kubeservice博客

如果这篇文章对你有用,请随意打赏

使用微信扫描二维码完成支付