Kubernetes Numa CPU亲和失败问题排查
背景
在配置numa拓扑管理 和 CPU管理后(Kubernetes Pod/Container NUMA亲和管理), 业务CPU亲和不生效;
spec:
containers:
- name: myapp-container
image: busybox:1.28
resources:
limits:
cpu: 2
memory: 1Gi
requests:
cpu: 2
memory: 1Gi
initContainers:
- name: init-myservice
image: busybox:1.28
分析过程
Kubelet 日志分析
问题1 : kubelet 日志异常
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2024-05-09 17:13:41 CST; 6 days ago
Docs: http://kubernetes.io/docs/
Process: 55204 ExecStartPre=/usr/bin/kubelet-pre-start.sh (code=exited, status=0/SUCCESS)
Main PID: 55226 (kubelet)
Tasks: 104
Memory: 222.3M
CGroup: /system.slice/kubelet.service
└─55226 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra...
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: rpc error: code = Unknown desc = failed to update resources: failed to update resources: runc did not terminate successfully: exit status 1: unable to freeze
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: : unknown
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: > containerID="f9c8dabff74b210ce64987abed0d6e3a4104b2f0cfb7904d42011cb23521918f"
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: E0516 12:39:33.998533 55226 cpu_manager.go:476] "ReconcileState: failed to update container" err=<
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: rpc error: code = Unknown desc = failed to update resources: failed to update resources: runc did not terminate successfully: exit status 1: unable to freeze
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: : unknown
5月 16 12:39:33 node-10-107-65-32 kubelet[55226]: > pod="autocar-baic-chaos-hd/esboost-7b9d6f55fd-wwjcz" containerName="app" containerID="f9c8dabff74b210ce64987abed0d6e3a4104b2f0cfb7904d42011cb23521918f" cpuSet="0-1,18-49,66-95"
5月 16 12:39:39 node-10-107-65-32 kubelet[55226]: I0516 12:39:39.075187 55226 state_mem.go:80] "Updated desired CPUSet" podUID="6e48e804-ce18-4658-a11c-c01f90ed8723" containerName="app" cpuSet="0-1,18-49,66-95"
5月 16 13:34:35 node-10-107-65-32 kubelet[55226]: I0516 13:34:35.600214 55226 scope.go:117] "RemoveContainer" containerID="19802f2886c28e4a36460f4e1a0874e907b11871be5bcb49637e71fab6ad87da"
5月 16 13:34:35 node-10-107-65-32 kubelet[55226]: I0516 13:34:35.600535 55226 scope.go:117] "RemoveContainer" containerID="2b3f55dc806bb42768bf86b139892369c5404d11e6aeccc453cd5ebc224973d4"
出现一个error: cpu_manager 中 更新 容器的cpuset状态异常;
分析 K8s 源码: cpu_manager 代码
// Once we make it here we know we have a running container.
// Idempotently add it to the containerMap incase it is missing.
// This can happen after a kubelet restart, for example.
m.containerMap.Add(string(pod.UID), container.Name, containerID)
m.Unlock()
cset := m.state.GetCPUSetOrDefault(string(pod.UID), container.Name)
if cset.IsEmpty() {
// NOTE: This should not happen outside of tests.
klog.V(4).InfoS("ReconcileState: skipping container; assigned cpuset is empty", "pod", klog.KObj(pod), "containerName", container.Name)
failure = append(failure, reconciledContainer{pod.Name, container.Name, containerID})
continue
}
lcset := m.lastUpdateState.GetCPUSetOrDefault(string(pod.UID), container.Name)
if !cset.Equals(lcset) {
klog.V(4).InfoS("ReconcileState: updating container", "pod", klog.KObj(pod), "containerName", container.Name, "containerID", containerID, "cpuSet", cset)
err = m.updateContainerCPUSet(ctx, containerID, cset)
if err != nil {
klog.ErrorS(err, "ReconcileState: failed to update container", "pod", klog.KObj(pod), "containerName", container.Name, "containerID", containerID, "cpuSet", cset)
failure = append(failure, reconciledContainer{pod.Name, container.Name, containerID})
continue
}
m.lastUpdateState.SetCPUSet(string(pod.UID), container.Name, cset)
}
success = append(success, reconciledContainer{pod.Name, container.Name, containerID})
问题是Kubelet启动后,会对/var/lib/kubelet/cpu_manager_state
下的文件更新更新,并设置最新的CPUSet
相关容器信息。 次问题出现在于更新失败,由于 kubelet重启之前没有删除cpu_manager_state
导致merge信息失败。
解决方法:
- 腾空节点。
- 停止 kubelet。
删除旧的 CPU 管理器状态文件
。该文件的路径默认为/var/lib/kubelet/cpu_manager_state
。 这将清除 CPUManager 维护的状态,以便新策略设置的 cpu-sets 不会与之冲突。- 编辑 kubelet 配置以将 CPU 管理器策略更改为所需的值。
- 启动 kubelet.
systemctl restart kubelet
问题2: 重启kubelet异常
[root@node-10-107-65-32 kubelet]# systemctl stop kubelet && systemctl status kubelet && systemctl start kubelet
Error: Too many open files
观测业务日志中页出现:
failed to create fsnotify watcher: too many open files
是由于 单个进程
允许打开的文件fd受限, 因此需要更新系统参数。
解决方式:
- 更新 Node节点上 进程级别能够打开的文件句柄的数量。 调整
Linux nofile & nproc
cat >> /etc/security/limits.conf <<EOF
# user file fd
* soft nofile 655360
* hard nofile 655360
# user max process
* soft nproc 15412410
* hard nproc 15412410
EOF
- 更新 内核参数 支持 单个user(PID) 支持的打开的文件fd数
$ cat >> <<EOF
fs.file-max=2097152 #系统级别的能够打开的文件句柄的数量
fs.inotify.max_user_instances=8192 #每一个real user ID可创建的inotify instatnces的数量上限. 默认 128
fs.inotify.max_queued_events=16384 #同一用户同时可以添加的watch数目. 默认 1024
EOF
$ sysctl -p
问题3: 带 initContainer 的Deployment 对CPUSet不生效问题
查询 cpumanager/policy_static.go 源代码:
func (p *staticPolicy) updateCPUsToReuse(pod *v1.Pod, container *v1.Container, cset cpuset.CPUSet) {
// If pod entries to m.cpusToReuse other than the current pod exist, delete them.
for podUID := range p.cpusToReuse {
if podUID != string(pod.UID) {
delete(p.cpusToReuse, podUID)
}
}
// If no cpuset exists for cpusToReuse by this pod yet, create one.
if _, ok := p.cpusToReuse[string(pod.UID)]; !ok {
p.cpusToReuse[string(pod.UID)] = cpuset.New()
}
// Check if the container is an init container.
// If so, add its cpuset to the cpuset of reusable CPUs for any new allocations.
for _, initContainer := range pod.Spec.InitContainers {
if container.Name == initContainer.Name {
if types.IsRestartableInitContainer(&initContainer) { # 这部分check initContainer 是否reuse Container的CPUSet资源
// If the container is a restartable init container, we should not
// reuse its cpuset, as a restartable init container can run with
// regular containers.
break
}
p.cpusToReuse[string(pod.UID)] = p.cpusToReuse[string(pod.UID)].Union(cset)
return
}
}
因此,找到原因: 如果initContainer 的资源和Container资源不一致, 就会break。不会更新/var/lib/kubelet/cpu_manager_state
和 容器的CPUSet设置
需要将 InitContainer
的 resources
设置,需要和 Container
的 resources
资源设置一样。
解决方式
更改业务 deployment yaml
:
spec:
containers:
- name: myapp-container
image: busybox:1.28
resources: # 这部分信息
limits:
cpu: 2
memory: 1Gi
requests:
cpu: 2
memory: 1Gi
initContainers:
- name: init-myservice
image: busybox:1.28
resources: # 与这部分信息,需要相同
limits:
cpu: 2
memory: 1Gi
requests:
cpu: 2
memory: 1Gi
将 containers
和 initContainers
设置一样。
其他
「如果这篇文章对你有用,请随意打赏」
如果这篇文章对你有用,请随意打赏
使用微信扫描二维码完成支付