Kubernetes 容器运行时接口 CRI

admin 2023年10月27日10:17:42评论11 views字数 10967阅读36分33秒阅读模式

写这篇文章是来填 很久之前挖下的坑[1]

本文涉及组件的源码版本如下:

  • Kubernetes 1.24
  • CRI 0.25.0
  • Containerd 1.6

容器运行时(Container Runtime)是负责管理和执行容器的组件。它负责将容器镜像转化为在主机上运行的实际容器进程,提供镜像管理、容器的生命周期管理、资源隔离、文件系统、网络配置等功能。

Kubernetes 容器运行时接口 CRI

常见容器运行时有下面这几种,这些容器运行时都提供了不同程度的功能和性能。但他们都遵循容器运行时接口(CRI),以便能够与 Kubernetes 或其他容器编排系统集成,实现容器的调度和管理。

  • containerd[2]
  • CRI-O[3]
  • Docker Engine[4]
  • Mirantis Container Runtime[5]

有了 CRI,我们也可以“随意”地在几种容器运行时之间进行切换,而无需重新编译 Kubernetes。简单来讲,CRI 定义了所有对容器的操作,作为容器编排系统与容器运行时间的标准接口存在。

CRI 的前生今世

Kubernetes 容器运行时接口 CRI

CRI 的首次引入是在 Kubernets 1.5[6],初始版本是 v1alpha1。在这之前,Kubernetes 需要在 kubelet 源码中维护对各个容器运行时的支持。

有了 CRI 之后,在 kubelet 中仅需支持 CRI 即可,然后通过一个中间层 CRI shim(grpc 服务器)与容器运行时进行交互。因为此时各家容器运行时实现还未支持 CRI。

在去年发布的 Kubernetes 1.24 中,正式移除了 Dockershim[7],与容易运行时的交互得到了简化。

Kubernetes 目前支持 CRI 的 v1alpha2v1。其中 v1 版本是在 Kubernetes 1.23 版本中引入的。

每次 kubelet 启动时,首先会尝试使用 v1 的 API 与容器运行时进行连接。如果失败,才会尝试使用 v1alpha2

kubelet 与 CRI

在之前做过的 kubelet 源码分析[8] 会持续监控来自 文件apiserverhttp 的变更,来更新 pod 的状态。写那篇文章的时候,分析到这里就结束了。因为这之后的工作就交给 容器运行时[9] 来完成 sandbox 和各种容器的创建和运行,见 `kubeGenericRuntimeManager#SyncPod()`[10]

kubelet 启动时便会 初始化 CRI 客户端[11],与容器运行时建立连接并确认 CRI 的版本。

创建 pod 的过程中,都会通过 CRI 与容器运行时进行交互:

  • 创建 sandbox
  • 创建容器
  • 拉取镜像

参考源码

  • pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L39[12]
  • pkg/kubelet/kuberuntime/kuberuntime_container.go#L176[13]
  • pkg/kubelet/images/image_manager.go#L89[14]

接下来我们以 Containerd 为例,看下如何处理 kubelet 的请求。

Containerd 与 CRI

Containerd 的 `criService`[15] 实现了 CRI 接口 `RuntimeService`[16]`ImageService `[17]RuntimeServiceServerImageServiceServer

cirService 会进一步包装成 `instrumentedService`[18],保证所有的操作都是在 k8s.io命名空间下执行的

RuntimeServiceServer

RuntimeServiceServer[19]

type RuntimeServiceServer interface {
 // Version returns the runtime name, runtime version, and runtime API version.
 Version(context.Context, *VersionRequest) (*VersionResponse, error)
 // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
 // the sandbox is in the ready state on success.
 RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
 // StopPodSandbox stops any running process that is part of the sandbox and
 // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
 // If there are any running containers in the sandbox, they must be forcibly
 // terminated.
 // This call is idempotent, and must not return an error if all relevant
 // resources have already been reclaimed. kubelet will call StopPodSandbox
 // at least once before calling RemovePodSandbox. It will also attempt to
 // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
 // multiple StopPodSandbox calls are expected.
 StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
 // RemovePodSandbox removes the sandbox. If there are any running containers
 // in the sandbox, they must be forcibly terminated and removed.
 // This call is idempotent, and must not return an error if the sandbox has
 // already been removed.
 RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
 // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
 // present, returns an error.
 PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
 // ListPodSandbox returns a list of PodSandboxes.
 ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
 // CreateContainer creates a new container in specified PodSandbox
 CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
 // StartContainer starts the container.
 StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
 // StopContainer stops a running container with a grace period (i.e., timeout).
 // This call is idempotent, and must not return an error if the container has
 // already been stopped.
 // The runtime must forcibly kill the container after the grace period is
 // reached.
 StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
 // RemoveContainer removes the container. If the container is running, the
 // container must be forcibly removed.
 // This call is idempotent, and must not return an error if the container has
 // already been removed.
 RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
 // ListContainers lists all containers by filters.
 ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
 // ContainerStatus returns status of the container. If the container is not
 // present, returns an error.
 ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
 // UpdateContainerResources updates ContainerConfig of the container synchronously.
 // If runtime fails to transactionally update the requested resources, an error is returned.
 UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
 // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
 // for the container. This is often called after the log file has been
 // rotated. If the container is not running, container runtime can choose
 // to either create a new log file and return nil, or return an error.
 // Once it returns error, new container log file MUST NOT be created.
 ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
 // ExecSync runs a command in a container synchronously.
 ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
 // Exec prepares a streaming endpoint to execute a command in the container.
 Exec(context.Context, *ExecRequest) (*ExecResponse, error)
 // Attach prepares a streaming endpoint to attach to a running container.
 Attach(context.Context, *AttachRequest) (*AttachResponse, error)
 // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
 PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
 // ContainerStats returns stats of the container. If the container does not
 // exist, the call returns an error.
 ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
 // ListContainerStats returns stats of all running containers.
 ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
 // PodSandboxStats returns stats of the pod sandbox. If the pod sandbox does not
 // exist, the call returns an error.
 PodSandboxStats(context.Context, *PodSandboxStatsRequest) (*PodSandboxStatsResponse, error)
 // ListPodSandboxStats returns stats of the pod sandboxes matching a filter.
 ListPodSandboxStats(context.Context, *ListPodSandboxStatsRequest) (*ListPodSandboxStatsResponse, error)
 // UpdateRuntimeConfig updates the runtime configuration based on the given request.
 UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
 // Status returns the status of the runtime.
 Status(context.Context, *StatusRequest) (*StatusResponse, error)
 // CheckpointContainer checkpoints a container
 CheckpointContainer(context.Context, *CheckpointContainerRequest) (*CheckpointContainerResponse, error)
 // GetContainerEvents gets container events from the CRI runtime
 GetContainerEvents(*GetEventsRequest, RuntimeService_GetContainerEventsServer) error
}

ImageServiceServer

ImageServiceServer[20]

type ImageServiceServer interface {  
    // ListImages lists existing images.    ListImages(context.Context, *ListImagesRequest) (*ListImagesResponse, error)  
    // ImageStatus returns the status of the image. If the image is not    // present, returns a response with ImageStatusResponse.Image set to    // nil.    ImageStatus(context.Context, *ImageStatusRequest) (*ImageStatusResponse, error)  
    // PullImage pulls an image with authentication config.    PullImage(context.Context, *PullImageRequest) (*PullImageResponse, error)  
    // RemoveImage removes the image.    // This call is idempotent, and must not return an error if the image has    // already been removed.    RemoveImage(context.Context, *RemoveImageRequest) (*RemoveImageResponse, error)  
    // ImageFSInfo returns information of the filesystem that is used to store images.  
    ImageFsInfo(context.Context, *ImageFsInfoRequest) (*ImageFsInfoResponse, error)  
}

下面以创建 sandbox 为例看一下 Containerd 的源码。

Containerd 源码分析

创建 sandbox 容器的请求通过 CRI 的 UDS(Unix domain socket)[21] 接口 /runtime.v1.RuntimeService/RunPodSandbox,进入到 criService 的处理流程中。在 criService#RunPodSandbox(),负责创建和运行 sandbox 容器,并保证容器状态正常。

  • 下载 sandobx 容器镜像
  • 初始化容器元数据
  • 初始化 pod 网络命名空间,详细内容可参考之前的文章 源码解析:从 kubelet、容器运行时看 CNI 的使用[22]
  • 更新容器元数据
  • 写入文件系统

参考源码

  • pkg/cri/server/sandbox_run.go#L61[23]
  • services/tasks/local.go#L156[24]

总结

CRI 提供了一种标准化的接口,用于与底层容器运行时进行交互。这对与发展和状大 Kubernetes 生态系统非常重要:

  • Kubernetes 控制平面与容器管理的具体实现解耦,可以独立升级或者切换容器运行时,方便扩展和优化。
  • Kubernetes 作为一个跨云、跨平台和多环境的容器编排系统,在不同的环境和场景下使用不同的容器平台。CRI 的出现,保证平台的多样性和灵活性。

参考资料

[1]

很久之前挖下的坑: https://atbug.com/how-kubelete-container-runtime-work-with-cni/#创建-pod

[2]

containerd: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

[3]

CRI-O: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cri-o

[4]

Docker Engine: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker

[5]

Mirantis Container Runtime: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#mcr

[6]

Kubernets 1.5: https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/

[7]

正式移除了 Dockershim: https://kubernetes.io/blog/2022/05/03/dockershim-historical-context/

[8]

kubelet 源码分析: https://mp.weixin.qq.com/s/O7k3MlgyonNtOUxNPrN8lg

[9]

容器运行时: https://kubernetes.io/docs/setup/production-environment/container-runtimes/

[10]

kubeGenericRuntimeManager#SyncPod(): https://github.com/kubernetes/kubernetes/blob/023d6fb8f4a7d130bf5c8e725ca310df9e663cd0/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L711

[11]

初始化 CRI 客户端: https://github.com/kubernetes/kubernetes/blob/14fcab83adf319b8ef8e82e1054412309c46f535/pkg/kubelet/kubelet.go#L285

[12]

pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L39: https://github.com/kubernetes/kubernetes/blob/ea929715339da4553589df61c8638bac3bcae618/pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L39

[13]

pkg/kubelet/kuberuntime/kuberuntime_container.go#L176: https://github.com/kubernetes/kubernetes/blob/3946d99904fe37ea04b231a8d101085b9b80b221/pkg/kubelet/kuberuntime/kuberuntime_container.go#L176

[14]

pkg/kubelet/images/image_manager.go#L89: https://github.com/kubernetes/kubernetes/blob/de37b9d293613aac194cf522561d19ee1829e87b/pkg/kubelet/images/image_manager.go#L89

[15]

criService: https://github.com/containerd/containerd/blob/1764ea9a2815ddbd0cde777b557f97171b84cd02/pkg/cri/server/service.go#L77

[16]

RuntimeService: https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto#L34

[17]

ImageService : https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto#L128

[18]

instrumentedService: https://github.com/containerd/containerd/blob/d3c7e31c8a8f7dc3f0ef0d189fda5a7caca42ce2/pkg/cri/server/instrumented_service.go#L32

[19]

RuntimeServiceServer: https://github.com/kubernetes/cri-api/blob/v0.25.0/pkg/apis/runtime/v1/api.pb.go#L9301

[20]

ImageServiceServer: https://github.com/kubernetes/cri-api/blob/v0.25.0/pkg/apis/runtime/v1/api.pb.go#L10131C9-L10131C9

[21]

UDS(Unix domain socket): https://en.wikipedia.org/wiki/Unix_domain_socket

[22]

源码解析:从 kubelet、容器运行时看 CNI 的使用: https://atbug.com/how-kubelete-container-runtime-work-with-cni/#创建-sandbox-容器

[23]

pkg/cri/server/sandbox_run.go#L61: https://github.com/containerd/containerd/blob/f2376e659ffa55e4ff2578baf4e4c7aab54042e4/pkg/cri/server/sandbox_run.go#L61

[24]

services/tasks/local.go#L156: https://github.com/containerd/containerd/blob/bbe46b8c43fc2febe316775bc2d4b9d697bbf05c/services/tasks/local.go#L156

  • Kubernetes 控制平面与容器管理的具体实现解耦,可以独立升级或者切换容器运行时,方便扩展和优化。
  • Kubernetes 作为一个跨云、跨平台和多环境的容器编排系统,在不同的环境和场景下使用不同的容器平台。CRI 的出现,保证平台的多样性和灵活性。

原文始发于微信公众号(Docker中文社区):Kubernetes 容器运行时接口 CRI

  • 左青龙
  • 微信扫一扫
  • weinxin
  • 右白虎
  • 微信扫一扫
  • weinxin
admin
  • 本文由 发表于 2023年10月27日10:17:42
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                   Kubernetes 容器运行时接口 CRIhttp://cn-sec.com/archives/2150648.html

发表评论

匿名网友 填写信息