Kubernetes kubelet 与 apiserver 断连后,依旧使用关闭连接
现象
表现是:通过kubelet
报错,一段时间后节点NotReady
。
日志:
E0906 02:03:08.585672 392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b0": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b0&limit=500&resourceVersion=0: read tcp 127.0.0.1:62060->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585749 392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b-mysqlprobe": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b-mysqlprobe&limit=500&resourceVersion=0: read tcp 127.0.0.1:62218->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585746 392662 reflector.go:123] object-"089f93c2"/"timezone": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Dtimezone&limit=500&resourceVersion=0: read tcp 127.0.0.1:62202->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585803 392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-master-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-master-suffix&limit=500&resourceVersion=0: read tcp 127.0.0.1:62228->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585889 392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-root-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-root-suffix&limit=500&resourceVersion=0: read tcp 127.0.0.1:62270->127.0.0.1:6443: use of closed network connection
E0906 02:03:08.585897 392662 desired_state_of_world_populator.go:312] Error processing volume "data" for pod "adc17e9833e-2987010-0_192f4074(73ab714a-2fb2-4232-a7bc-2b56ed1f23e0)": error processing PVC 192f4074/data-adc17e9833e-2987010-0: failed to fetch PVC from API server: Get https://127.0.0.1:6443/api/v1/namespaces/192f4074/persistentvolumeclaims/data-adc17e9833e-2987010-0: read tcp 127.0.0.1:62284->127.0.0.1:6443: use of closed network connection
E0906 02:03:25.527514 392662 reflector.go:123] object-"kube-system"/"default-token-zlltw": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/kube-system/secrets?fieldSelector=metadata.name%3Ddefault-token-zlltw&limit=500&resourceVersion=0: EOF
E0906 02:03:25.581593 392662 reflector.go:123] object-"089f93c2"/"a697eaa005a-4b60b-mysqlprobe": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Da697eaa005a-4b60b-mysqlprobe&limit=500&resourceVersion=0: EOF
E0906 02:03:25.781629 392662 reflector.go:123] object-"089f93c2"/"timezone": Failed to list *v1.ConfigMap: Get https://127.0.0.1:6443/api/v1/namespaces/089f93c2/configmaps?fieldSelector=metadata.name%3Dtimezone&limit=500&resourceVersion=0: EOF
E0906 02:03:25.981829 392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-master-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-master-suffix&limit=500&resourceVersion=0: EOF
E0906 02:03:26.181634 392662 reflector.go:123] object-"bf8e4f55"/"aa859eba67d-80a840-root-suffix": Failed to list *v1.Secret: Get https://127.0.0.1:6443/api/v1/namespaces/bf8e4f55/secrets?fieldSelector=metadata.name%3Daa859eba67d-80a840-root-suffix&limit=500&resourceVersion=0: EOF
kubelet
与 apiserver
断连后, 仍然使用旧连接导致连接失败.
问题追查
对比client-go
代码: /util/net/http.go#L89-L99
case err == io.EOF:
return true
case err == io.ErrUnexpectedEOF:
return true
case msg == "http: can't write HTTP request on broken connection":
return true
case strings.Contains(msg, "http2: server sent GOAWAY and closed the connection"):
return true
case strings.Contains(msg, "connection reset by peer"):
return true
case strings.Contains(strings.ToLower(msg), "use of closed network connection"):
return true
命中和http2
中, 未将不可用链接及时关闭。 具体修复 net/http: don’t cache http2.erringRoundTripper connections
if err == nil {
return resp, nil
}
- if http2isNoCachedConnError(err) {
+
+ // Failed. Clean up and determine whether to retry.
+
+ _, isH2DialError := pconn.alt.(http2erringRoundTripper)
+ if http2isNoCachedConnError(err) || isH2DialError {
t.removeIdleConn(pconn)
t.decConnsPerHost(pconn.cacheKey)
- } else if !pconn.shouldRetryRequest(req, err) {
+ }
+ if !pconn.shouldRetryRequest(req, err) {
// Issue 16465: return underlying net.Conn.Read error from peek,
// as we've historically done.
if e, ok := err.(transportReadFromServerError); ok {
解决方式
短期
定期重启kubelet
: crontab设置定时重启
#!/bin/bash
output=$(journalctl -u kubelet -n 1 | grep "use of closed network connection")
if [[ $? != 0 ]]; then
echo "Error not found in logs"
elif [[ $output ]]; then
echo "Restart kubelet"
systemctl restart kubelet
fi
长期
升级 K8S 版本到 1.19+
影响版本
Kubernetes 1.16
- 1.18.8
版本, 包括使用client-go 0.16.0-0.18.8
版本
其他
- https://github.com/kubernetes/kubernetes/issues/87615
- https://github.com/golang/go/issues/34978
- https://github.com/golang/go/issues/40213
「如果这篇文章对你有用,请随意打赏」
如果这篇文章对你有用,请随意打赏
使用微信扫描二维码完成支付
