When you deploy an Azure Kubernetes Service with a node pool composed by spot virtual machines, you are running a cluster with the risk of losing nodes based on the configuration you set.

Eviction may occur based on capacity or max price.

In this post I’ll show you how to deploy an AKS cluster with such configuration and simulate a node eviction. The exercise will help you understand the resiliency of your solution and how to query related events with log analytics.

Use Terraform to create an AKS cluster with an extra node pool with Spot Virtual Machines.

Create variables.tf with the following contents:

 1variable "resource_group_name" {
 2  default = "aks-spot"
 3}
 4
 5variable "location" {
 6  default = "West Europe"
 7}
 8
 9variable "cluster_name" {
10  default = "aks-spot"
11}
12
13variable "dns_prefix" {
14  default = "aks-spot"
15}

Create providers.tf with the following contents:

 1terraform {
 2  required_version = "> 0.12"
 3  required_providers {
 4    azurerm = {
 5      source  = "azurerm"
 6      version = ">= 2.97.0"
 7    }
 8  }
 9}
10
11provider "azurerm" {
12  features {}
13}

Create main.tf with the following contents:

 1# Create Resource Group
 2resource "azurerm_resource_group" "rg" {
 3  name     = var.resource_group_name
 4  location = var.location
 5}
 6
 7# Create the VNET
 8resource "azurerm_virtual_network" "vnet" {
 9  name                = "aks-vnet"
10  location            = azurerm_resource_group.rg.location
11  resource_group_name = azurerm_resource_group.rg.name
12  address_space       = ["10.0.0.0/16"]
13}
14
15# Create the subnet
16resource "azurerm_subnet" "aks-subnet" {
17  name                 = "aks-subnet"
18  resource_group_name  = azurerm_resource_group.rg.name
19  virtual_network_name = azurerm_virtual_network.vnet.name
20  address_prefixes     = ["10.0.1.0/24"]
21}
22
23# Deploy Kubernetes
24resource "azurerm_kubernetes_cluster" "k8s" {
25  name                = var.cluster_name
26  location            = azurerm_resource_group.rg.location
27  resource_group_name = azurerm_resource_group.rg.name
28  dns_prefix          = var.dns_prefix
29
30  default_node_pool {
31    name                = "default"
32    node_count          = 2
33    vm_size             = "Standard_D2s_v3"
34    os_disk_size_gb     = 30
35    os_disk_type        = "Ephemeral"
36    vnet_subnet_id      = azurerm_subnet.aks-subnet.id
37    max_pods            = 15
38    enable_auto_scaling = false
39  }
40
41  # Using Managed Identity
42  identity {
43    type = "SystemAssigned"
44  }
45
46  network_profile {
47    # The --service-cidr is used to assign internal services in the AKS cluster an IP address. This IP address range should be an address space that isn't in use elsewhere in your network environment, including any on-premises network ranges if you connect, or plan to connect, your Azure virtual networks using Express Route or a Site-to-Site VPN connection.
48    service_cidr = "172.0.0.0/16"
49    # The --dns-service-ip address should be the .10 address of your service IP address range.
50    dns_service_ip = "172.0.0.10"
51    # The --docker-bridge-address lets the AKS nodes communicate with the underlying management platform. This IP address must not be within the virtual network IP address range of your cluster, and shouldn't overlap with other address ranges in use on your network.
52    docker_bridge_cidr = "172.17.0.1/16"
53    network_plugin     = "azure"
54    network_policy     = "calico"
55  }
56
57  role_based_access_control {
58    enabled = true
59  }
60
61  addon_profile {
62    kube_dashboard {
63      enabled = false
64    }
65  }
66}
67
68resource "azurerm_kubernetes_cluster_node_pool" "spot" {
69  kubernetes_cluster_id = azurerm_kubernetes_cluster.k8s.id
70  name                = "spot"
71  priority            = "Spot"
72  eviction_policy     = "Delete"
73  spot_max_price      = -1 # note: this is the "maximum" price
74  os_type             = "Linux"
75  vm_size             = "Standard_DS3_v2"
76  os_disk_type        = "Ephemeral"
77  node_count          = 1
78  enable_auto_scaling = true
79  max_count           = 3
80  min_count           = 1
81}
82
83data "azurerm_resource_group" "node_resource_group" {
84  name = azurerm_kubernetes_cluster.k8s.node_resource_group
85}
86
87# Assign the Contributor role to the AKS kubelet identity
88resource "azurerm_role_assignment" "kubelet_contributor" {
89  scope                = data.azurerm_resource_group.node_resource_group.id
90  role_definition_name = "Contributor" #"Virtual Machine Contributor"?
91  principal_id         = azurerm_kubernetes_cluster.k8s.kubelet_identity[0].object_id
92}
93
94resource "azurerm_role_assignment" "kubelet_network_contributor" {
95  scope                = azurerm_virtual_network.vnet.id
96  role_definition_name = "Network Contributor"
97  principal_id         = azurerm_kubernetes_cluster.k8s.identity[0].principal_id
98}

Note:

  • The cluster will have two node pools.
  • The node pool with the spot virtual machines is named: spot
  • No máx price is set: spot_max_price = -1
  • Eviction Policy is set to Delete: eviction_policy = "Delete"

Create outputs.tf with the following contents:

1output "node_resource_group" {
2  value = data.azurerm_resource_group.node_resource_group.id
3}

Deploy the cluster:

To deploy the cluster, run the following commands:

1terraform init
2terraform apply -auto-approve

Simulate Spot Node Eviction:

To simulate a spot node eviction, you’ll need the name of the Virtual Machine Scale Set used to manage the spot virtual Machines and then use the az vmss simulate-eviction CLI command:

1$nodeResourceGroup=$(terraform output -raw node_resource_group)
2$windowsScaleSet=$(az vmss list --resource-group $nodeResourceGroup --query "[].{name:name}[? contains(name,'spot')] | [0].name" --output tsv)
3az vmss simulate-eviction --resource-group $nodeResourceGroup --name $windowsScaleSet --instance-id 0

Note:

  • The previous command will simulate eviction of instance 0.

Bonus: Query Log Analytics for Preempt status:

If you connect the cluster to Log Analytics and simulate eviction you’ll be able to catch related events querying for the PreemptScheduled status:

1let endDateTime = now();
2let startDateTime = ago(1h);
3KubeNodeInventory
4| where TimeGenerated < endDateTime
5| where TimeGenerated >= startDateTime
6| where Status contains "PreemptScheduled"

Hope it helps!!!