TIPS之 Kubernetes kubelet 与 apiserver 断连后,依旧使用关闭连接

Kubernetes kubelet 与 apiserver 断连后,依旧使用关闭连接

Posted by 董江 on Tuesday, September 19, 2023

Kubernetes kubelet 与 apiserver 断连后,依旧使用关闭连接

现象

表现是:通过kubelet 报错,一段时间后节点NotReady

日志:

E0906 02:03:08.585672  392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b0": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b0&limit=500&resourceVersion=0: read tcp 127.0.0.1:62060->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585749  392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b-mysqlprobe": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b-mysqlprobe&limit=500&resourceVersion=0: read tcp 127.0.0.1:62218->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585746  392662 reflector.go:123] object-"089f93c2"/"timezone": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Dtimezone&limit=500&resourceVersion=0: read tcp 127.0.0.1:62202->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585803  392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-master-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-master-suffix&limit=500&resourceVersion=0: read tcp 127.0.0.1:62228->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585889  392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-root-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-root-suffix&limit=500&resourceVersion=0: read tcp 127.0.0.1:62270->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585897  392662 desired_state_of_world_populator.go:312] Error processing volume "data" for pod "adc17e9833e-2987010-0_192f4074(73ab714a-2fb2-4232-a7bc-2b56ed1f23e0)": error processing PVC 192f4074/data-adc17e9833e-2987010-0: failed to fetch PVC from API server: Get https://127.0.0.1:6443/api/v1/namespaces/192f4074/persistentvolumeclaims/data-adc17e9833e-2987010-0: read tcp 127.0.0.1:62284->127.0.0.1:6443: use of closed network connection

E0906 02:03:25.527514  392662 reflector.go:123] object-"kube-system"/"default-token-zlltw": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/kube-system/secrets?fieldSelector=metadata.name%3Ddefault-token-zlltw&limit=500&resourceVersion=0: EOF
E0906 02:03:25.581593  392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b-mysqlprobe": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b-mysqlprobe&limit=500&resourceVersion=0: EOF
E0906 02:03:25.781629  392662 reflector.go:123] object-"089f93c2"/"timezone": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Dtimezone&limit=500&resourceVersion=0: EOF
E0906 02:03:25.981829  392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-master-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-master-suffix&limit=500&resourceVersion=0: EOF
E0906 02:03:26.181634  392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-root-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-root-suffix&limit=500&resourceVersion=0: EOF

kubeletapiserver 断连后, 仍然使用旧连接导致连接失败.

问题追查

对比client-go代码: /util/net/http.go#L89-L99

	case err == io.EOF:
		return true
	case err == io.ErrUnexpectedEOF:
		return true
	case msg == "http: can't write HTTP request on broken connection":
		return true
	case strings.Contains(msg, "http2: server sent GOAWAY and closed the connection"):
		return true
	case strings.Contains(msg, "connection reset by peer"):
		return true
	case strings.Contains(strings.ToLower(msg), "use of closed network connection"):
		return true

命中和http2中, 未将不可用链接及时关闭。 具体修复 net/http: don’t cache http2.erringRoundTripper connections

 		if err == nil {
 			return resp, nil
 		}
-		if http2isNoCachedConnError(err) {
+
+		// Failed. Clean up and determine whether to retry.
+
+		_, isH2DialError := pconn.alt.(http2erringRoundTripper)
+		if http2isNoCachedConnError(err) || isH2DialError {
 			t.removeIdleConn(pconn)
 			t.decConnsPerHost(pconn.cacheKey)
-		} else if !pconn.shouldRetryRequest(req, err) {
+		}
+		if !pconn.shouldRetryRequest(req, err) {
 			// Issue 16465: return underlying net.Conn.Read error from peek,
 			// as we've historically done.
 			if e, ok := err.(transportReadFromServerError); ok {

解决方式

短期

定期重启kubelet: crontab设置定时重启

#!/bin/bash
output=$(journalctl -u kubelet -n 1 | grep "use of closed network connection")
if [[ $? != 0 ]]; then
  echo "Error not found in logs"
elif [[ $output ]]; then
  echo "Restart kubelet"
  systemctl restart kubelet
fi

长期

升级 K8S 版本到 1.19+

影响版本

Kubernetes 1.16 - 1.18.8 版本, 包括使用client-go 0.16.0-0.18.8版本

其他

「如果这篇文章对你有用,请随意打赏」

Kubeservice博客

如果这篇文章对你有用,请随意打赏

使用微信扫描二维码完成支付