Two weeks ago I got an email message from Microsoft Azure explaining that Azure Kubernetes Services had been patched but that I had to restart my nodes (reboot the clusters) to complete the operation.
The first thing you need to know is that, when things like this happens, the Azure platform creates a file called /var/run/reboot-required
in each of the nodes of your cluster.
The second thing is that a Kubernetes Reboot Daemon named Kured exists and if installed in your cluster will run on each pod watching for the existence of the /var/run/reboot-required
file. Kured then takes care of the reboots for you so only one node is restarted at a time.
So how do you install it and check what’s going on with your nodes? Let’s see:
Check the nodes
Damn I was running the affected kernel version: 4.15.0-1049-azure.
1kubectl get nodes -o wide
1NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
2aks-agentpool-14502547-0 Ready agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1049-azure docker://3.0.4
3aks-agentpool-14502547-1 Ready agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1049-azure docker://3.0.4
4aks-agentpool-14502547-2 Ready agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1049-azure docker://3.0.4
Install Kured
1kubectl apply -f https://github.com/weaveworks/kured/releases/download/1.2.0/kured-1.2.0-dockerhub.yaml
Check the nodes again
1kubectl get nodes -o wide
You can see that the first two nodes already restarted and the second has scheduling disabled cause it was still in the process.
1aks-agentpool-14502547-0 Ready agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1049-azure docker://3.0.4
2aks-agentpool-14502547-1 Ready agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1050-azure docker://3.0.4
3aks-agentpool-14502547-2 Ready,SchedulingDisabled agent 3d v1.12.8 <none> Ubuntu 16.04.6 LTS 4.15.0-1050-azure docker://3.0.4
To learn more about Kured and the features it offers, please check here.
Hope it helps!
Comments