技术分享之 实现Pod资源视图隔离

基于FUSE文件系统, 实现Pod资源视图隔离

Posted by 董江 on Tuesday, April 27, 2021

Pod 容器内资源可见性:让Pod的资源视角真实、准确

是否有个发现:Pod中限定了CPUMEM等资源大小,然而登入的POD中查询资源,却还是Node总的资源大小?

对于业务上云, java(识别内存资源开辟大小)、golang(识别CPU个数开启runtime线程个数) 等语言,在OOMGC方面的问题,有时常发生的原因

利用lxcfs将容器中读取出来的CPUMEMdiskswaps的信息是宿主机的信息,与容器实际分配和限制的资源量相同。 解决低层通过os.syscall获得的资源信息一致。

复现步骤

部署一个lxcfs-demo应用pod

apiVersion: v1
kind: Pod
metadata:
  name: lxcfs-demo
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
    resources:     #限制了pod资源大小
      requests:
        memory: "64Mi"  
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  restartPolicy: Always

登入到pod中查看真实资源视角, 如下:

dongjiangdeMacBook-Pro:kubernetes $ kubectl exec -it lxcfs-demo  "/bin/sh"
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # free -h
              total        used        free      shared  buff/cache   available
Mem:           2.9G      802.9M      117.0M      333.4M        2.0G        1.6G
Swap:        512.0M        1.3M      510.7M
/ # cat /proc/cpuinfo| grep "cpu cores"| uniq   //物理Core数
cpu cores	: 1
/ # cat /proc/cpuinfo| grep "processor"| wc -l  //逻辑Core数
2
/ # 

Pod 资源视角 与 部署要求限定的完全不一样。 Pod 内看到的系统信息,完全是Node的信息

Lxcfs介绍

lxcfs是一个FUSE文件系统,使得Linux容器的文件系统更像虚拟机。lxcfs是一个常驻进程运行在宿主机上,从而来自动维护宿主机cgroup中容器的真实资源信息与容器内/proc下文件的映射关系。

原理

lxcfs实现的基本原理: 通过文件挂载的方式,把POD OCI cgroup中容器相关的信息读取出来,存储到lxcfs相关的目录下,并将相关目录映射到容器内的/proc目录下,从而使得容器内执行top, free等命令时拿到的/proc下的数据是真实的cgroup分配给容器的CPU和内存数据。

lxcfs

lxcfs 的 Kubernetes使用

为了让 Node 上所有 Pod 多支持 lxcfs 资源视角. Kubernetes 中通过 daemonset 方式在每个 work节点上都启动

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: lxcfs
  labels:
    app: lxcfs
spec:
  selector:
    matchLabels:
      app: lxcfs
  template:
    metadata:
      labels:
        app: lxcfs
    spec:
      hostPID: true
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: lxcfs
        image: dongjiang1989/lxcfs:v4.0.12
        imagePullPolicy: Always
        securityContext:
          privileged: true
        volumeMounts:
        - name: cgroup
          mountPath: /sys/fs/cgroup
        - name: lxcfs
          mountPath: /var/lib/lxcfs
          mountPropagation: Bidirectional
        - name: usr-local
          mountPath: /usr/local
      volumes:
      - name: cgroup
        hostPath:
          path: /sys/fs/cgroup
      - name: usr-local
        hostPath:
          path: /usr/local
      - name: lxcfs
        hostPath:
          path: /var/lib/lxcfs
          type: DirectoryOrCreate

lxcfs-admission-webhook 实现了一个动态的准入webhook,更准确的讲是实现了一个修改性质的webhook,即监听pod的创建,然后对pod执行patch的操作,从而将lxcfs与容器内的目录映射关系植入到pod创建的yaml中从而实现自动挂载。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: lxcfs-admission-webhook-deployment
  namespace: kube-system
  labels:
    app: lxcfs-admission-webhook
spec:
  replicas: 1
  selector:
    matchLabels:
      app: lxcfs-admission-webhook
  template:
    metadata:
      labels:
        app: lxcfs-admission-webhook
    spec:
      serviceAccountName: lxcfs-webhook-serviceaccount
      containers:
        - name: lxcfs-admission-webhook
          image: dongjiang1989/lxcfs-webhook:latest
          imagePullPolicy: Always
          args:
            - -tlsCertFile=/etc/webhook/certs/tls.crt
            - -tlsKeyFile=/etc/webhook/certs/tls.key
            - -alsologtostderr
            - -v=4
            - 2>&1
          resources:
            limits:
              cpu: 500m
              memory: 128Mi
            requests:
              cpu: 10m
              memory: 64Mi
          volumeMounts:
            - mountPath: /etc/webhook/certs/
              name: cert
              readOnly: true
      volumes:
      - name: cert
        secret:
          defaultMode: 420
          secretName: lxcfs-webhook-server-cert

需要Linux OS,开启FUSE模块支持. Mac 是unix的裁剪系统,不支持FUSE

验证结果

测试Demo

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd-test
  template:
    metadata:
      labels:
        app: httpd-test
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
        - name: httpd
          image: httpd:2.4.32
          imagePullPolicy: Always
          resources:
            requests:
              memory: "256Mi"
              cpu: "1"
            limits:
              memory: "256Mi"
              cpu: "1"

验证信息

  1. Node机器资源:

16G内存、2CPU 4核

[root@kcs-cpu-test-m-qm2dd /]#  free -h      
              total        used        free      shared  buff/cache   available
Mem:            15G        2.0G        2.5G        4.4M         10G         12G
Swap:            0B          0B          0B
[root@kcs-cpu-test-m-qm2dd /]#  cat /proc/cpuinfo | grep "physical id"
physical id     : 0
physical id     : 0
physical id     : 1
physical id     : 1
[root@kcs-cpu-test-m-qm2dd /]#  cat /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
  1. Pod内信息

Pod 内 内存是256M; CPU是1Core

[root@kcs-cpu-test-m-qm2dd /]#  kubectl  get pod | grep "httpd-test"
httpd-test-68b9b9d74f-5tmnh       1/1     Running   0          11m

[root@kcs-cpu-test-m-qm2dd /]#  kubectl exec -it httpd-test-68b9b9d74f-5tmnh "/bin/bash"
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.

root@httpd-test-68b9b9d74f-5tmnh:/usr/local/apache2# free -h
             total       used       free     shared    buffers     cached
Mem:          256M       8.4M       247M       268K         0B       272K
-/+ buffers/cache:       8.1M       247M
Swap:           0B         0B         0B

[root@kcs-cpu-test-m-qm2dd /]#  top
top - 09:31:23 up 15 min,  0 users,  load average: 0.00, 0.00, 0.00
Tasks:   6 total,   1 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu0  :  4.5 us,  2.1 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    262144 total,     9072 used,   253072 free,        0 buffers
KiB Swap:        0 total,        0 used,        0 free.      276 cached Mem

其他

「如果这篇文章对你有用,请随意打赏」

Kubeservice博客

如果这篇文章对你有用,请随意打赏

使用微信扫描二维码完成支付