Sunday, September 4, 2022

GKE Autopilot - Bad Experience

 



In this post we will review my experience of using GKE autopilot. 


TL;DR

While GKE autopilot is simpler to use, it has some restrictions, which limit its flexability. The bottom line is that I do not recommend using it.


The Story

I've been using GKE standard mode for a long time, and for a new project, I've decided to check out GKE autopilot. Google surely try to promote it, as it is the default mode for a new kubernetes cluster creation.

It does sound great, you no longer need to maintain node pools, which is a real burden, including the decision of which type of compute instances to use, and the pools size configuration. It also states that you pay only ~70$ per month for the cluster service, and everything else is only by the pods resources requirements specifications.

However, right from the start, I noticed that Google had decides to take some assumptions and constrains that simplify the GKE autopilot for them (the GKE implementors), and not for us (the GKE users).

When creating the GKE, the Telemetry (Cloud operation logging and monitoring) is turned on by default. There is no way to turn it off. See the question I've asked in stackoverflow. This means you pay, and might pay quite a lot in case of logs over the quota.

Using standard GKE automatically collects CPU and memory metrics, that you can collect using prometheus, and display in grafana, but for GKE autopilot, the kube-system namespace is managed and protected by Google, hence you cannot get these metrics, even the metrics-server is running as part ot the cluster. See another question I've asked about this in stackoverflow.

The last issue, which made me abandon the GKE autopilot is the resources requests limitations. The minimum resource requests is 0.25vCPU and 0.5GiB RAM. These are quite large values, specifically for dev and test clusters, and also for small microservices on a production cluster. Not only that, but you also need to maintain a ratio of 1:1 to 1:6.5 between CPU and memory. So if you need 1GiB RAM, you must allocate 1GiB RAM.  These two last restrictions have high implications on the costs.


Final Note

As specified before, I do not recommed using GKE autopilot. It might be nice for a temporary cluster that you want to setup for a limited period of time, but in the end the restrications will make you abandon it.

In general I think Google's GKE is much better, simpler, and cheaper than AWS EKS, but the autopilot seems to me like a miss. I hope that as time passes these issues will be addressed.


No comments:

Post a Comment