Wednesday, August 5, 2020

Guidelines for a Redis Auto Scale Operator on Kubernetes




In this post we will review the a design of a Redis auto scale operator on kubernetes.


At this time, I will not include code in this post, only the guidelines.
I hope that in the future I will have time to create an open source project based on the project I've implemented for a specific implementation. But still, if you need to create a Redis operator, this is a great start point.

Before digging into this article, I recommend reading other posts I've created for Redis:

The Redis auto scale operator monitors the Redis cluster and the application. 
It keeps track of the actual amount of the Redis nodes, and the required amount of Redis nodes, and fixes the first one toward the second one.

A Redis cluster is based on a group of masters nodes, that uses sharding to handle the data.
Each master usually has one or more slave nodes to share the read load, and to take control in case the master is down. A common practice is to use a master-slave pair. Note that the actual master-slave recovery is handled entirely by the Redis itself, while the Redis auto scale operator handles add and removal if master-slave pairs.

The Redis auto scale operator is scheduled to run every constant interval, for example, every 1 minute.
To avoid throttling, the Redis auto scale operator does not perform any actions if it had started soon after the last action. So we use a "grace period" of 3 minutes after a change of the Redis cluster, before making an additional change.

Then the Redis auto scale operator performs 2 steps: Cluster Repair, and Cluster Scaling.


Step 1: Cluster Repair


The Redis cluster must be in a stable state before any scale operation is done. The Redis cluster might become unstable for example: in case the Redis auto scale operator had problems in the last scale operation, or just due to a Redis cluster internal problems.

Listed below are the items handled as part of the cluster repair.


Repair A: Delete redundant replicas


The Redis auto scale operator fetch the number of known Redis cluster nodes using the CLUSTER INFO CLI. Then it checks the amount of replicas of the Redis kubernetes StatefulSet.
In case the amount of replicas is higher than the amount of the nodes, the redundant replicas are deleted by update of the StatefulSet replicas specification. Then the Redis auto scale operator waits for the pods to terminate.


Repair B: Force stable for a migrating slot


In some cases, a slot is "stuck" in migrating. This is a Redis internal issue that occurs once in a while during a Redis cluster rebalance (see SETSLOT for details). The Redis cluster sets a slot as migrating from a source node to a destination node, but in case of problems, it might remain configured in the source node as migrating, but not configured as importing in the destination node.

The Redis auto scale operator runs the CLUSTER NODES CLI on each node, and looks for inconsistency in a slot state. In case it finds such a slot, it uses the CLUSTER SET SLOT slotId STABLE CLI to fix it. 


Repair C: Delete odd node


The Redis cluster is compound of master-slave pairs. In case the known redis nodes amount is odd, the Redis auto scale operator removed the last node. 

In case the last node is a master, it removes it using the REBALANCE --CLUSTER-WEIGHT nodeId=0 CLI. 

Then it deletes the node using the DEL-NODE nodeId CLI.

Lastly, it updates the replicas specification in the StatefulSet, and waits for the pod to terminate.


Repair D: Balance Empty Master


The Redis auto scale operator uses the CLUSTER NODES CLI to find any Redis master node that has no slots assigned to it. In case it finds one, it runs the REBALANCE --CLUSTER-USE-EMPTY-MASTERS CLI.


Repair E: Delete Redundant PVCs


The Redis auto scale operator lists the kubernetes PVCs (Persistence Volume Claims), and compare this list with the StatefulSet replicas count. In case a PVC is not used by any replica, it is deleted. The goal is to prevent a state where an old node, that was delete due to scale down, will start with empty configuration upon scale up. Otherwise, the node might have old configuration, and will not be able to join to the cluster.


Repair F: Convert double master pair


The Redis cluster is compound of a master-slave pairs, which means that sequential pods in the StatefulSet should be a master-slave pair. In some cases, due to Redis internal problems, a pair might be converted into a master-master pair. In such as case the Redis auto scale operator uses the REBALANCE --CLUSTER-WEIGHT nodeId=0 CLI to move the slots out of one of the masters, and the uses the CLUSTER REPLICATE CLI to convert it to a slave node.


Step 2: Cluster Scaling


One the Redis cluster is stable, we can easily scale it.

First, we need to decide how to set the required amount of the Redis nodes.
We can use the average CPU metrics among the Redis nodes. For example, in case the average is over 60% CPU, scale the cluster up. Another metric that is useful, is the application load, for example, requests per second. We can decide that each master-slave pair can handle up to 1000 requests per second, and set the required amount of nodes accordingly.

Then we check the actual amount of nodes using the CLUSTER INFO CLI, and scale up or down based on the required amount of nodes and actual amount of nodes


Scale up


To scale up the Redis auto scale operator sets the replicas specification of the StatefuleSet, wait for the pods to start.

Then the Redis auto scale operator uses the ADD NODE CLI and ADD NODE --CLUSTER-SLAVE CLI alternatively to add a master and a slave node.

Lastly, the Redis auto scale operator uses the REBALANCE --CLUSTER-USE-EMPTY-MASTERS CLI to move slots to the new nodes.


Scale Down


To scale down, the Redis auto scale operator uses the REBALANCE --CLUSTER-WEIGHT nodeId=0 CLI to remove the master node. 

Then it deletes the master and slave nodes using the DEL-NODE nodeId CLI.

Lastly, it updates the replicas specification in the StatefulSet, and waits for the pod to terminate, and deletes the PVC used by these nodes.


Final Notes


The major part of the Redis auto scale operator code is the Cluster repair, and it is not surprising. Things always go wrong, and a good operator should handle anything it can without a human intervention.

No comments:

Post a Comment