Kubernetes 1.22 was recently released and the steady march towards deprecating the docker shim progresses. On-premise cluster admins may be a bit daunted by the task of migrating to a new container runtime, but fear not! We recently helped a client through the process and wanted to share our findings to help make the migration easier for everyone.
Before you start re-adding nodes to the cluster, it’d be wise to set the kubelet-config configmap to use the systemd cgroup driver. You can find instructions on that here:
Start by picking your least favourite worker node, cordoning it and waiting for the workloads to drain from it.
kubectl drain <node-name> --ignore-daemonsets
Then remove it from the cluster.
kubectl delete node <node-name>
Now reset the removed node and clean it up:
Stop docker and kubelet
systemctl stop kubelet systemctl disable --now docker
Remove docker and reinstall containerd. Unless you mark the containerd package as manually installed, some package managers remove it as a dependency of docker-ce.
Make sure you clean up the data that isn’t cleared when running reset
Prepare the system for containerd as described here: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter # Setup required sysctl params, these persist across reboots. cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF # Apply sysctl params without reboot sudo sysctl --system
Now we generate the default config for containerd
sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml
edit the containerd config file to enable the systemd cgroup driver for the CRI
And below the line:
SystemdCgroup = true
Start and enable containerd:
systemctl enable --now containerd
Once containerd is up and running you’ll need to re-install kubelet and kubeadm. I found that kubelet only reliably detected the new container runtime when reinstalled. If anyone knows of a way to get this to work without reinstalling, send your answers on a postcard please :)
Finally rejoin the node to the cluster. You should be able to verify the Node’s CRI with
kubectl get nodes -o wide
Once this is all done for your worker nodes, your control plane nodes are next. The steps are mostly the same, but if you run etcd in-cluster, it may be a good idea to
reset your nodes before you
delete them, to ensure they safely exit the etcd cluster.
Now with Docker removed and your cluster fully migrated to Containerd, you have much less to worry about in the upcoming kubernetes updates! You still have a little bit of time of course, with the earliest possible deprecation being later this year with kubernetes version 1.23.