Kubernetes kueue 原生作业队列调度器

背景

kueue是一款 Kubernetes 原生作业队列调度器. 扩展 Job/CronJob 调度层对 Job依赖、 优先级、 资源公平性等进行的多维度的扩展。实现基于底层调度job调度的通用上层管理组件。

可以解决场景：

按策略启停任务：考虑一家云提供商，其服务器在夜间更便宜。我的工作需要几天时间才能完成，所以我想每天一大早暂停我的工作，天黑后恢复它以省钱。但是，我不想每天删除并重新创建我的作业，因为我不想丢失已完成的 Pod、日志和其他元数据的跟踪。
优先级任务：有许多用户向我的集群提交作业。所有用户提交的作业均使用suspend: true. 资源数量有限，因此我必须在正确的时间以正确的顺序恢复这些暂停的作业。

依赖

SuspendJob

SuspendJob 提供了一项 Job 启动增强功能，允许暂停和恢复作业。

暂停作业: 将删除该作业拥有的所有活动 Pod，并指示控制器管理器在作业恢复之前不要创建新的 Pod。用户还可以在暂停状态下创建作业，从而无限期地延迟 Pod 创建；
恢复作业: 控制控管理器通过设置Pod状态，实现Pod执行恢复操作；

apiVersion: batch/v1
kind: Job
metadata:
  name: example-job
spec:
  suspend: true # 将 .spec.suspend 设为 true 可以暂停 Job 执行。用户可以在需要时再次将其设置为 false，恢复 Job 的执行
  template:
    spec:
      containers:
        - name: example
          image: busybox:1.28
          command: ["sleep", "30"]
      restartPolicy: Never

// These are built-in conditions of a job.
const (
	// JobSuspended means the job has been suspended.
	JobSuspended JobConditionType = "Suspended"
	// JobComplete means the job has completed its execution.
	JobComplete JobConditionType = "Complete"
	// JobFailed means the job has failed its execution.
	JobFailed JobConditionType = "Failed"
	// FailureTarget means the job is about to fail its execution.
	JobFailureTarget JobConditionType = "FailureTarget"
)

`JobMutableNodeSchedulingDirectives` feature gate

对 Job Pod服务, 该作业以前从未被取消过 (Job.Spec.Suspend=true && Job.Status.StartTime=nil), 通过更改JobPod的节点亲和性（NodeAffinity）、节点选择器（NodeSelector）、容忍度(Toleration)、注释(Annotitions)和标签(Label) , 放宽对Pod调度限制。

	// Updating node affinity, node selector and tolerations is allowed
	// only for suspended jobs that never started before.
	suspended := oldJob.Spec.Suspend != nil && *oldJob.Spec.Suspend
	notStarted := oldJob.Status.StartTime == nil
	opts.AllowMutableSchedulingDirectives = suspended && notStarted

架构和流程

其他

「如果这篇文章对你有用,请随意打赏」

FEATURED TAGS

agent apiserver application bandwidth-limit cgo cgroupfs ci/cd client-go cloudnative cncf cni community container container-network-interface containerd controller coredns crd cuda custom-controller deployment device-plugin docker docker-build docker-image drop ebpf ecology egress etcd gitee github gitlab golang governance gpu-device hpa http2 image ingress iptables jobs kata kata-runtime kernel kind kubelet kubenetes kubernetes library linux-os logging loki metrics monitor namespace network network-troubleshooting node nodeport nvidai ollama pingmesh pod prestop prometheus proxyless pvc rollingupdate schedule scheduler serverless sglang sidecar sigtrem systemd tensorrt-llm throttling timeout tools traceroute vllm

TIPS之 Kubernetes kueue 批处理支持作业调度器

Kubernetes kueue 批处理支持作业调度器

Kubernetes kueue 原生作业队列调度器

背景

依赖

SuspendJob

`JobMutableNodeSchedulingDirectives` feature gate

架构和流程

其他

CATALOG

FEATURED TAGS

Kubernetes kueue 原生作业队列调度器

背景

依赖

SuspendJob

JobMutableNodeSchedulingDirectives feature gate

架构和流程

其他

CATALOG

FEATURED TAGS

`JobMutableNodeSchedulingDirectives` feature gate