Azure Cache for Redis supports zone redundancy in its Premium and Enterprise tiers. A zone-redundant cache runs on VMs spread across multiple Availability Zones. It provides higher resilience and availability.

Today I’ll show hot to test the failover of a zone-redundant cache.

Deploy Azure Cache for Redis with availability zones

Create a main.tf file with the following content:

  1terraform {
  2  required_version = "> 0.14"
  3  required_providers {
  4    azurerm = {
  5      version = "= 2.57.0"
  6    }
  7    random = {
  8      version = "= 3.1.0"
  9    }
 10  }
 11}
 12
 13provider "azurerm" {
 14  features {}
 15}
 16
 17# Location of the services
 18variable "location" {
 19  default = "west europe"
 20}
 21
 22# Resource Group Name
 23variable "resource_group" {
 24  default = "redis-failover"
 25}
 26
 27# Name of the Redis cluster
 28variable "redis_name" {
 29  default = "redis-failover"
 30}
 31
 32resource "random_id" "random" {
 33  byte_length = 8
 34}
 35
 36resource "azurerm_resource_group" "rg" {
 37  name     = var.resource_group
 38  location = var.location
 39}
 40
 41resource "azurerm_redis_cache" "redis" {
 42  name                = "${var.redis_name}-${lower(random_id.random.hex)}"
 43  location            = azurerm_resource_group.rg.location
 44  resource_group_name = azurerm_resource_group.rg.name
 45  capacity            = 2
 46  family              = "P"
 47  sku_name            = "Premium"
 48  enable_non_ssl_port = true
 49  minimum_tls_version = "1.2"
 50
 51  redis_configuration {
 52  }
 53
 54  zones = ["1", "2"]
 55}
 56
 57resource "azurerm_log_analytics_workspace" "logs" {
 58  name                = "redis-logs"
 59  location            = azurerm_resource_group.rg.location
 60  resource_group_name = azurerm_resource_group.rg.name
 61  sku                 = "PerGB2018"
 62  retention_in_days   = 30
 63}
 64
 65resource "azurerm_monitor_diagnostic_setting" "monitor" {
 66  name                       = lower("extaudit-${var.redis_name}-diag")
 67  target_resource_id         = azurerm_redis_cache.redis.id
 68  log_analytics_workspace_id = azurerm_log_analytics_workspace.logs.id
 69
 70  metric {
 71    category = "AllMetrics"
 72
 73    retention_policy {
 74      enabled = false
 75    }
 76  }
 77
 78  log {
 79    category = "ConnectedClientList"
 80    enabled  = false
 81
 82    retention_policy {
 83      days    = 0
 84      enabled = false
 85    }
 86  }
 87
 88  lifecycle {
 89    ignore_changes = [metric]
 90  }
 91}
 92
 93output "redis_name" {
 94  value = azurerm_redis_cache.redis.name
 95}
 96
 97output "redis_host_name" {
 98  value = azurerm_redis_cache.redis.hostname
 99}
100
101output "redis_primary_access_key" {
102  value     = azurerm_redis_cache.redis.primary_access_key
103  sensitive = true
104}

Note: the zones are specified: zones = ["1", "2"], making the cache zone-redundant.

Deploy the Azure Cache for Redis with availability zones:

Run the following command to deploy the Azure Cache for Redis with availability zones:

1terraform init
2terraform apply -auto-approve

Test the Azure Cache for Redis failover

Donwload the redis-cli tool:

1Invoke-WebRequest -Uri "https://github.com/microsoftarchive/redis/releases/download/win-3.2.100/Redis-x64-3.2.100.zip" -OutFile redis.zip -UseBasicParsing
2Expand-Archive -Path .\redis.zip -DestinationPath .\redis-cli

Use the redis-cli to prepare the cache instance with data:

1$redis_name=$(terraform output redis_name)
2$redis_host_name=$(terraform output redis_host_name)
3$redis_primary_access_key=$(terraform output redis_primary_access_key)
4
5.\redis-cli\redis-benchmark -h $redis_host_name -a $redis_primary_access_key -t SET -n 10 -d 1024

Check the availability zone hosting the master node:

1$redis_name=$(terraform output redis_name)
2az redis show -n $redis_name -g redis-failover --query "instances[?isMaster]"

you should get an output similar to:

 1[
 2  {
 3    "isMaster": true,
 4    "isPrimary": true,
 5    "nonSslPort": 13000,
 6    "shardId": 0,
 7    "sslPort": 15000,
 8    "zone": "1"
 9  }
10]

Use the redis-cli to execute a long running process:

1.\redis-cli\redis-benchmark -h $redis_host_name -a $redis_primary_access_key -t GET -n 1000000 -d 1024 -c 50

Test the Azure Cache for Redis failover (CLI):

From another terminal, run the following command to test the Azure Cache for Redis failover:

1$redis_name=$(terraform output redis_name)
2az redis force-reboot --reboot-type PrimaryNode -n $redis_name -g redis-failover

Note: at the time of writing, the previous command fails with an exception:

1(InternalServerError) Something went wrong.
2RequestID=dececf94-7f11-4ffa-9a4b-35694dd3f091
3Code: InternalServerError
4Message: Something went wrong.
5RequestID=dececf94-7f11-4ffa-9a4b-35694dd3f091

Please track the following issue for more information: az redis force-reboot fails with InternalServerError

Test the Azure Cache for Redis failover (Azure Portal):

To reboot the Primary Node, head to the Azure portal and use the Administration/Reboot section of the Redis cluster:

Primary Node reboot using the Azure Portal

Once failover start you should see the long running process disconnect. This means your applications must be able to recover from transient errors when working with the cache.

Once the failover is complete, check again which availability zone hosts the primary node:

1$redis_name=$(terraform output redis_name)
2az redis show -n $redis_name -g redis-failover --query "instances[?isMaster]"

the output should be similar to:

 1[
 2  {
 3    "isMaster": true,
 4    "isPrimary": true,
 5    "nonSslPort": 13001,
 6    "shardId": 0,
 7    "sslPort": 15001,
 8    "zone": "2"
 9  }
10]

Hope it helps!!!