In this post we will review the required steps to automatically scale an application installed on Google Kubernetes Engine (GKE). We will use a combination of several functionalities:

Metrics-Server

First we need to supply the CPU/memory metrics per pod. This can be done using the metrics-server.

The metric server is:

...a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.

To apply the metrics server we use the following command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

After a few minutes, the metrics server has collected the statistics for our pods, and we can use the following command to check the pods metrics:

kubectl top pod

HPA

In the kubernetes documentation we find that HPA:

...scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization

Hence, to use HPA, we start by configuring the resources requests in our deployment.

Notice that HPA uses the resources requests, and not the resources limits.

For example, in our deployment, we specify:

spec:
  containers:
    - name: c1
      image: my-image
      resources:
        requests:
          cpu: "0.5"

Next, we create the HPA.

The HPA configures the min and max replicas for the deployment.

It also configures the target averaged CPU usage based on the CPU usage on all of the running pods.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  maxReplicas: 10
  minReplicas: 2
  targetCPUUtilizationPercentage: 80

The HPA will create new pods according to the load on the system, but at some stage, there might be too many pods to run on the kubernetes nodes, as the nodes have limited resources, and hence the pods will remain in a pending state. This is where GKE part completes the puzzle.

GKE Cluster Autoscaler

We use GKE Cluster Autoscaler to allocate new kubernetes cluster nodes upon need:

...resizes the number of nodes in a given node pool, based on the demands of your workloads

So once we have pods in a pending state due to insufficient CPU/memory resources, the GKE Cluster Autoscaler adds new nodes.

To configure the GKE Cluster Autoscaler, we update the node pool configuration:

gcloud container clusters update my-k8s-cluster \
    --enable-autoscaling \
    --min-nodes 3 \
    --max-nodes 10 \
    --zone my-zone \
    --node-pool my-pool

Final Notes

In this post we have reviewed the steps to handle kubernetes autoscaling.

Note that you can also configure HPA to use not just memory and CPU, but also custom metrics to scale your application.

Also, we need to use pod anti-affinity rules to avoid all pods from starting on the same node.

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Saturday, July 4, 2020

Scale your application using HPA on GKE

Metrics-Server

HPA

GKE Cluster Autoscaler

Final Notes

No comments:

Post a Comment