Saturday, July 4, 2020

Scale your application using HPA on GKE



In this post we will review the required steps to automatically scale an application installed on Google Kubernetes Engine (GKE). We will use a combination of several functionalities: 

Metrics-Server


First we need to supply the CPU/memory metrics per pod. This can be done using the metrics-server.
The metric server is:


"
...a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
"


To apply the metrics server we use the following command:


kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml


After a few minutes, the metrics server has collected the statistics for our pods, and we can use the following command to check the pods metrics:


kubectl top pod


HPA


In the kubernetes documentation we find that HPA:


"
...scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
"

Hence, to use HPA, we start by configuring the resources requests in our deployment.
Notice that HPA uses the resources requests, and not the resources limits.

For example, in our deployment, we specify:


spec:
containers:
- name: c1
image: my-image
resources:
requests:
cpu: "0.5"


Next, we create the HPA. 
The HPA configures the min and max replicas for the deployment.
It also configures the target averaged CPU usage based on the CPU usage on all of the running pods.


apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: my-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
maxReplicas: 10
minReplicas: 2
targetCPUUtilizationPercentage: 80


The HPA will create new pods according to the load on the system, but at some stage, there might be too many pods to run on the kubernetes nodes, as the nodes have limited resources, and hence the pods will remain in a pending state. This is where GKE part completes the puzzle.


GKE Cluster Autoscaler


We use GKE Cluster Autoscaler to allocate new kubernetes cluster nodes upon need:

"
...resizes the number of nodes in a given node pool, based on the demands of your workloads
"

So once we have pods in a pending state due to insufficient CPU/memory resources, the GKE Cluster Autoscaler adds new nodes.

To configure the GKE Cluster Autoscaler, we update the node pool configuration:


gcloud container clusters update my-k8s-cluster \
    --enable-autoscaling \
    --min-nodes 3 \
    --max-nodes 10 \
    --zone my-zone \
    --node-pool my-pool


Final Notes


In this post we have reviewed the steps to handle kubernetes autoscaling.

Note that you can also configure HPA to use not just memory and CPU, but also custom metrics to scale your application.

Also, we need to use pod anti-affinity rules to avoid all pods from starting on the same node.














No comments:

Post a Comment