技术方案之 Kubernetes Pod进程网络带宽 流量控制
背景
混合云场景业务Pod直接相互干扰 、 在离线混部(在离线服务同时在一台机器上服务用户) 等场景下,除了对cpu、mem、fd、inode、pid等进行隔离,还需要对 网络带宽bandwidth、磁盘读写速度IPOS、NBD IO、L3 Cache、内存带宽MBA 等都需要做到隔离和限制
因此,本章节介绍下 网络带宽bandwidth limit 的使用和实现
Kubernetes 具体使用和实现

容器拉起,是通过运行时接口对底层cni网络插件来生产虚拟网络,bind到容器实现。对容器进行网络限制,底层需要cni网络插件的限制,而cni网络插件 会将网络限制指令,将具体配置提交给 Linux 流量控制 (tc) 子系统,tc 包含一组机制和操作,数据包通过这些机制和操作在网络接口上排队等待传输/接收(令牌桶过滤器TBF),从而达到流量控制
CNI 对 Linux TC 操作
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0", #必须0.3.0 containernetworking plugin 目前最高版本
"plugins":
[
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "127.0.0.1",
"ipam": { "type": "host-local", "subnet": "usePodCidr" },
"policy": { "type": "k8s" },
"kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" },
},
{
"type": "bandwidth",
"capabilities": {
"bandwidth": true #支持cri-o json配置提交
},
/* 以下是对cni插件网络限流操作, capabilities和一下4个配置二选一
"ingressRate": 123,
"ingressBurst": 456,
"egressRate": 123,
"egressBurst": 456
*/
},
]
}
cni插件支持本配置,也支持cri-o、contaierd、dockershim等通过json配置提交
func cmdAdd(args *skel.CmdArgs) error {
// cni 配置解析
conf, err := parseConfig(args.StdinData)
if err != nil {
return err
}
//...
// 从配置中活动 ingress Rate和Burst
if bandwidth.IngressRate > 0 && bandwidth.IngressBurst > 0 {
// TC TBF 中创建流控规则
err = CreateIngressQdisc(bandwidth.IngressRate, bandwidth.IngressBurst, hostInterface.Name)
if err != nil {
return err
}
}
// 从配置中活动 egress Rate和Burst
if bandwidth.EgressRate > 0 && bandwidth.EgressBurst > 0 {
// ...
// 对特定本地Device设置出口流控规则
err = CreateEgressQdisc(bandwidth.EgressRate, bandwidth.EgressBurst, hostInterface.Name, ifbDeviceName)
if err != nil {
return err
}
}
return types.PrintResult(result, conf.CNIVersion)
}
OCR 流控配置
通过Pod配置annotations
apiVersion: v1
kind: Pod
metadata:
name: iperf-slow
annotations:
kubernetes.io/ingress-bandwidth: 10M
kubernetes.io/egress-bandwidth: 10M
...
Kubenetes 代码支持在 pod annotations解析和使用
kubernetes.io/ingress-bandwidth 和 kubernetes.io/egress-bandwidth 值只是支持 1k-1P, 超过32G需要调整Kernel参数
// 配置值在 1k-1p之间
var minRsrc = resource.MustParse("1k")
var maxRsrc = resource.MustParse("1P")
// 获取pod annotations并传递给 runc
func ExtractPodBandwidthResources(podAnnotations map[string]string) (ingress, egress *resource.Quantity, err error) {
if podAnnotations == nil {
return nil, nil, nil
}
str, found := podAnnotations["kubernetes.io/ingress-bandwidth"]
if found {
ingressValue, err := resource.ParseQuantity(str)
if err != nil {
return nil, nil, err
}
ingress = &ingressValue
if err := validateBandwidthIsReasonable(ingress); err != nil {
return nil, nil, err
}
}
str, found = podAnnotations["kubernetes.io/egress-bandwidth"]
if found {
egressValue, err := resource.ParseQuantity(str)
if err != nil {
return nil, nil, err
}
egress = &egressValue
if err := validateBandwidthIsReasonable(egress); err != nil {
return nil, nil, err
}
}
return ingress, egress, nil
}
以contaierd为例, kubelet 活动 pod yaml信息后续,传递给containerd runtime,并继续传递给cni插件
func cniNamespaceOpts(id string, config *runtime.PodSandboxConfig) ([]cni.NamespaceOpts, error) {
opts := []cni.NamespaceOpts{
cni.WithLabels(toCNILabels(id, config)),
cni.WithCapability(annotations.PodAnnotations, config.Annotations),
}
portMappings := toCNIPortMappings(config.GetPortMappings())
if len(portMappings) > 0 {
opts = append(opts, cni.WithCapabilityPortMap(portMappings))
}
// pod annotations中获得配置,最后传递给cni
bandWidth, err := toCNIBandWidth(config.Annotations)
if err != nil {
return nil, err
}
if bandWidth != nil {
opts = append(opts, cni.WithCapabilityBandWidth(*bandWidth))
}
// ...
}
验证和测试
流控依赖Linux TC子系统。目前只支持Linux K8s集群
apiVersion: apps/v1
kind: Deployment
metadata:
name: iperf-server-deployment
labels:
app: iperf-server
spec:
replicas: 1
selector:
matchLabels:
app: iperf-server
template:
metadata:
labels:
app: iperf-server
#添加注解
annotations:
kubernetes.io/ingress-bandwidth: 1M
kubernetes.io/egress-bandwidth: 1M
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: iperf3-server
image: dongjiang1989/iperf
args: ['-s', '-p', '5001']
ports:
- containerPort: 5001
name: server
terminationGracePeriodSeconds: 0
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: iperf-client
labels:
app: iperf-client
spec:
replicas: 1
selector:
matchLabels:
app: iperf-client
template:
metadata:
labels:
app: iperf-client
spec:
containers:
- name: iperf-client
image: dongjiang1989/iperf
command: ['/bin/sh', '-c', 'sleep 1d']
terminationGracePeriodSeconds: 0
对于未添加网络限流注解
$ kubectl get pod | grep iperf
iperf-client-7874c47d95-t7hph 1/1 Running 0 5m58s
iperf-server-deployment-74d94bdd59-dzdl4 1/1 Running 0 5m58s
kubectl exec iperf-client-7874c47d95-t7hph -- iperf -c 10.1.0.173 -p 5001 -i 10 -t 100
------------------------------------------------------------
Client connecting to 10.1.0.173, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 1] local 10.1.0.172 port 56296 connected with 10.1.0.173 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.00 sec 19.7 GBytes 16.9 Gbits/sec
[ 1] 10.00-20.00 sec 18.9 GBytes 16.2 Gbits/sec
[ 1] 20.00-30.00 sec 20.0 GBytes 17.2 Gbits/sec
[ 1] 30.00-40.00 sec 20.4 GBytes 17.5 Gbits/sec
[ 1] 40.00-50.00 sec 18.5 GBytes 15.9 Gbits/sec
[ 1] 50.00-60.00 sec 19.3 GBytes 16.5 Gbits/sec
[ 1] 60.00-70.00 sec 17.6 GBytes 15.1 Gbits/sec
[ 1] 70.00-80.00 sec 17.1 GBytes 14.7 Gbits/sec
[ 1] 80.00-90.00 sec 18.4 GBytes 15.8 Gbits/sec
[ 1] 90.00-100.00 sec 15.1 GBytes 13.0 Gbits/sec
[ 1] 0.00-100.00 sec 185 GBytes 15.9 Gbits/sec
未做限流,Bandwidth可以到15.9Gbits/sec
对于添加网络限流注解
$ kubectl get pod | grep iperf
iperf-clients-rcsh6 1/1 Running 0 7h7m
iperf-server-deployment-59675c8f78-g52pm 1/1 Running 0 6h52m
$ kubectl exec iperf-clients-rcsh6 -- iperf -c 10.1.0.170 -p 5001 -i 10 -t 100
------------------------------------------------------------
Client connecting to 10.1.0.170, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 1] local 10.1.0.170 port 54652 connected with 10.1.0.170 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.00 sec 3.50 MBytes 2.94 Mbits/sec
[ 1] 10.00-20.00 sec 2.25 MBytes 1.89 Mbits/sec
[ 1] 20.00-30.00 sec 2.04 MBytes 1.71 Mbits/sec
[ 1] 30.00-40.00 sec 892 KBytes 731 Kbits/sec
[ 1] 40.00-50.00 sec 954 KBytes 781 Kbits/sec
[ 1] 50.00-60.00 sec 1.36 MBytes 1.14 Mbits/sec
[ 1] 60.00-70.00 sec 1.18 MBytes 993 Kbits/sec
[ 1] 70.00-80.00 sec 87.1 KBytes 71.4 Kbits/sec
[ 1] 80.00-90.00 sec 0.000 Bytes 0.000 bits/sec
[ 1] 90.00-100.00 sec 2.97 MBytes 2.50 Mbits/sec
[ 1] 0.00-100.69 sec 15.5 MBytes 1.29 Mbits/sec
限制1Mbits/sec, 流控真实表现是 1.29 Mbits/sec
为啥限制1Mbits/sec, 流控真实表现略大约1Mbits/sec?
原因:在Linux系统中, 1M = 1024k的; 而 K8s中使用 Resource 对象实现的 1M = 1000k的.
因此,真实 设置 1Mbits/sec 在 Linux 中的表现应该是 1024*1024(bits/sec)/(1000*1000) = 1.048Mbits/sec.
在0-1s之间,TC控制不准确,会有数据平均增大的问题
总结
-
- docker
1.18支持runc runtime json传递;containerd作为runtime,1.4版本才能支持;
- docker
-
- calico需要
2.1版本; cilium需要1.12.90版本; kube-ovn需要版本1.9.0版本;但是需要支持
`ovn.kubernetes.io/ingress_rate` : Ingress 流量的速率限制,单位:Mbits/s `ovn.kubernetes.io/egress_rate` : Egress 流量的速率限制,单位:Mbits/s - calico需要
-
- 不能动态更新annotation里面的流量限制大小,更新之后必须删除pod重建;
因此,需要通过webhook来将丰富配置namespcae下的limitrange含义拉齐, 并支持默认填充
具体实现方式
先通过 CRD 描述 namespace 下 limitrange 扩展限制
设计如下:
apiVersion: custom.xxx.com/v1
kind: CustomLimitRange
metadata:
name: test-rangelimit
spec:
limitrange:
type: pod # 对pod类型限制,以后扩展到 contianer类型、ingress类型、service类型
max: # max和min是限制的上下线,如果pod自定义的值不在其中,ValidatingAdmissionWebhook校验报错
ingress-bandwidth: "1G"
egress-bandwidth: "1G"
min:
ingress-bandwidth: "10M"
egress-bandwidth: "10M"
default: # 定义了default,如果pod annotation为空,MutatingAdmissionWebhook自动注入此数据;未定义default,不作强注入操作
ingress-bandwidth: "128M"
egress-bandwidth: "128M"
在pod 可以是支持设置 customlimitrange.kubernetes.io/limited : disable, 可支持 ignore namespace下CustomLimitRange限制
注意
本身CustomLimitRange自身校验必不可少:
- max value >= default value >= min value
- value range [1k, 1P] && value 类型 Kbits/sec, Mbits/sec, Gbits/sec, Tbits/sec 和 Pbits/sec
- type 类型 enum
- max、min 和 default 可缺省
- 内部适配:kube-ovn annotation
使用方式
-
- Pod和Deployment添加注解annotation
# Pod
apiVersion: v1
kind: Pod
metadata:
name: xxxx
annotations:
kubernetes.io/ingress-bandwidth: 1M
kubernetes.io/egress-bandwidth: 1M
...
# Deployment
...
spec:
template:
metadata:
#添加注解
annotations:
kubernetes.io/ingress-bandwidth: 1M
kubernetes.io/egress-bandwidth: 1M
...
-
- 通过定义Custom LimitRange 自动添加annotation. 如以上
下一章节
「如果这篇文章对你有用,请随意打赏」
如果这篇文章对你有用,请随意打赏
使用微信扫描二维码完成支付