Monday, August 30, 2021

Reducing GKE Pricing using Multiple Node Pools


 

In this post we'll review a method of reducing the GKE compute cost using multiple node pool.

The GKE (kubernetes cluster on Google Cloud Platform) is using compute instances. Each compute instance is a VM with specific CPU and memory properties. See this table for each VM type cost details.

When running pods on GKE, it is a good practice to allocate in advanced the CPU and memory resources, for example:



apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
selector:
matchLabels:
configid: my-container
template:
metadata:
labels:
configid: my-container
spec:
containers:
- name: container1
image: my-image
resources:
requests:
cpu: "2"
memory: 2Gi



The example deployment is using 2 CPUs, and 2G RAM. The deployment pods will run on the GKE VMs, and consume its resources. By nature some of the deployments tend to be higher CPU consumers, while other deployments tend to be higher memory consumers. 

As we want to reduce costs, we can use VMs whose CPU is cheaper for the high CPU consumers deployments, and use VMs whose RAM is cheaper for the high RAM consumers deployments.

For example, checkout the e2-highcpu-8, and the e2-highmem-8 in the instances table. We can create 2 node pools, one for CPU consumers, and second for memory consumers:



gcloud container node-pools create pool-mem --cluster=mycluster --num-nodes=2 --machine-type=e2-highmem-8
gcloud container node-pools create pool-cpu --cluster=mycluster --num-nodes=2 --machine-type=e2-highcpu-8



Let's also auto-scale these pools:



gcloud container clusters update mycluster --enable-autoscaling --min-nodes 2 --max-nodes 15 --zone us-central1-c --node-pool pool-mem
gcloud container clusters update mycluster --enable-autoscaling --min-nodes 2 --max-nodes 8 --zone us-central1-c --node-pool pool-cpu



To assign a deployment to a pool, we add the node selector to its spec:



spec:
nodeSelector:
cloud.google.com/gke-nodepool: "pool-cpu"
containers:
...



Final Note


We have used the method described in a specific project, and reduce GKE costs by 30%. We've used the memory pool for the redis pods. This is especially relevant as the redis is a single CPU process, and hence requires a lot of RAM, and only 1 CPU. Other pods, that were very high on CPU, used a very small amount of memory, and hence we assigned these to the CPU pool.



Monday, August 23, 2021

Using Go Echo as a Dynamic Reverse Proxy to Multiple Micro-Services

 

In the post Using NGINX auth_request to proxy to dynamically multiple backend servers, I've reviewed the process of using an NGINX auth request to handle the routing decision making, as displayed in the following diagram.




While this method is working fine, there is a cost to such usage. Each request, is sent from the NGINX to the router decision maker, so we need to wait for the send and receive round-trip.

This time, we're going to "merge" the NGINX reverse proxy, and the router micro-service in one micro-service. 




To implement, we will use GO Echo library, which supplies a dynamic reverse proxy capabilities. Let's start with a standard Echo based HTTP server, which implements response to health checks, for example - from the kubernetes liveness and readiness probes:

package main

import (
"github.com/labstack/echo/v4"
"github.com/labstack/echo/v4/middleware"
"net/http"
)

func main() {
server := echo.New()

server.Use(middleware.CORSWithConfig(middleware.CORSConfig{
AllowOrigins: []string{"*"},
AllowHeaders: []string{"*"},
AllowMethods: []string{"POST", "GET"},
AllowCredentials: true,
}))

server.POST("/health", func(ctx echo.Context) error {
ctx.JSON(http.StatusOK, "i am alive")
return nil
})

err := server.Start(":8080")
if err != nil {
panic(err)
}
}



A simple test of our service, would return HTTP code 200, with the "i am alive" payload.



curl -X POST http://localhost:8080/health
"i am alive"
  



Now, let's add the reverse proxy code. Notice that we add this code as an additional middleware over the existing server.



server := echo.New()

// proxy middleware - start
router := produceRouter()
config := middleware.ProxyConfig{
Balancer: router,
Skipper: router.Skip,
}
proxyMiddleware := middleware.ProxyWithConfig(config)
server.Use(proxyMiddleware)
// proxy middleware - end

server.Use(middleware.CORSWithConfig(middleware.CORSConfig{



In this example, our middleware routing logic is dynamically selecting the backend service based on the request host name. Another important issue the the Skip function, allowing us to skip the reverse proxy method, and behave as a standard web server. In this example, we skip the reverse proxy middleware in case the request host name is an IP address. This is very relevant if we want our health probes to continue to function, so we count on kubernetes that uses an IP address instead of host name.



type Router struct {
serviceA *url.URL
serviceB *url.URL
}

func produceRouter() *Router {
serviceA, err := url.Parse("http://service-a.com")
if err != nil {
panic(err)
}

serviceB, err := url.Parse("http://service-b.com")
if err != nil {
panic(err)
}

return &Router{
serviceA: serviceA,
serviceB: serviceB,
}
}

func (b Router) AddTarget(*middleware.ProxyTarget) bool {
return true
}

func (b Router) RemoveTarget(string) bool {
return true
}

func (b Router) Next(context echo.Context) *middleware.ProxyTarget {
urlTarget := b.getProxyUrl(context)

return &middleware.ProxyTarget{
Name: urlTarget.Host,
URL: urlTarget,
Meta: nil,
}
}

func (b Router) getProxyUrl(context echo.Context) *url.URL {
host:= context.Request().Host
if strings.HasPrefix(host,"a"){
return b.serviceA
}

return b.serviceB
}

func (b Router) Skip(context echo.Context) bool {
host := context.Request().Host
firstChar := host[0]
if firstChar >= '0' && firstChar <= '9' {
// fast and naive IP detection
return true
}
return false
}



Final Note


In this post we have implemented a dynamic reverse proxy using GO Echo library. This kind of an implementation could be used for A/B, Blue/Green, and canary selection of a backend functionality. All we need to change is the getProxyUrl to our decision making logic, which can be based on source IP address, request host name, or a randomized split of requests.




Wednesday, August 11, 2021

Relative Variability

 

In this post we will review the relative variability, and check how can we use it to compare variance of 2 different sets.

First let's create a function that gets a set of numbers:

func stats(numbers []float64) {
// ...
}


Calculate the mean:


sum := float64(0)
for _, element := range numbers {
sum += element
}
mean := sum / float64(len(numbers))
fmt.Printf("mean: %v\n", mean)



The variance and the standard deviation:


variance := float64(0)
for _, element := range numbers {
distanceSqr := math.Pow(element-mean, 2)
variance += distanceSqr
}
variance = variance / float64(len(numbers))
std := math.Sqrt(variance)
fmt.Printf("variance: %v\nstd: %v\n", variance, std)



and lastly the relative variability:


relativeVariability := std / math.Abs(mean)
fmt.Printf("relative variability: %v\n", relativeVariability)


The relative variability can be used to compare between different sets, let examine some examples.


The basic example is a set which is a constant number - all identical.



fmt.Println("=== constant ===")
numbers := make([]float64, 1000)
for i := 0; i < len(numbers); i++ {
numbers[i] = 100
}
stats(numbers)


And the result is



=== constant ===
mean: 100
variance: 0
std: 0
relative variability: 0



Which is quite expected. But let us check two other sets, one is using random numbers in range 0-100, and the other is using random number is range 0-1000.


fmt.Println("=== random 0-100 ===")
numbers = make([]float64, 1000)
for i := 0; i < len(numbers); i++ {
numbers[i] = rand.Float64() * 100
}
stats(numbers)

fmt.Println("=== random 0-1000 ===")
numbers = make([]float64, 1000)
for i := 0; i < len(numbers); i++ {
numbers[i] = rand.Float64() * 1000
}
stats(numbers)



And the result is:

=== random 0-100 ===
mean: 50.761294848805164
variance: 855.2429996004388
std: 29.24453794472463
relative variability: 0.5761188328987829
=== random 0-1000 ===
mean: 504.7749855147492
variance: 84752.21156230739
std: 291.122330923458
relative variability: 0.5767368417168754



We can see that the relative variance of the 2 sets is almost identical. This enables us to deduct that the variance behaves in a similar way.


What about slicing different sizes of samples?
In the following, we will use different sections of the same set.



numbers = make([]float64, 100000)
for i := 0; i < len(numbers); i++ {
numbers[i] = rand.Float64() * 100
}
fmt.Println("=== random using sections - size 10 ===")
stats(numbers[:10])
fmt.Println("=== random using sections - size 100 ===")
stats(numbers[:100])
fmt.Println("=== random using sections - size 1000 ===")
stats(numbers[:1000])
fmt.Println("=== random using sections - size 10000 ===")
stats(numbers[:10000])
fmt.Println("=== random using sections - size 100000 ===")
stats(numbers)



And the result is:

=== random using sections - size 10 ===
mean: 39.58300235324653
variance: 768.6101314621283
std: 27.723818847015437
relative variability: 0.7003970694189037
=== random using sections - size 100 ===
mean: 46.59611065676921
variance: 968.7449099481023
std: 31.124667226303036
relative variability: 0.6679670639373723
=== random using sections - size 1000 ===
mean: 49.66072506547081
variance: 814.3409641880382
std: 28.536660004072626
relative variability: 0.5746323672570423
=== random using sections - size 10000 ===
mean: 49.99659036064225
variance: 833.8674856543009
std: 28.876763766985746
relative variability: 0.5775746617656908
=== random using sections - size 100000 ===
mean: 50.05194561771965
variance: 831.7444664704859
std: 28.839980347955958
relative variability: 0.5762009846375657


So we can see that above a certain size of set, we get enough accuracy for the relative variability. Looks like small set suffer from varying variance. In some runs the relative variability of the set of size 10 was very high, while in other it was very low. So to use relative variability, or actually any statistic method, make sure the set is big enough.






Wednesday, August 4, 2021

GO and Race Condition


 


A race condition is defined as:

race condition or race hazard is the condition of an electronicssoftware, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of the possible behaviors is undesirable.

        quoted from the Wikipedia site.


Go includes a race condition detector that is very simple to use:



$ go test -race mypkg    // test the package
$ go run -race mysrc.go  // compile and run the program
$ go build -race mycmd   // build the command
$ go install -race mypkg // install the package

        quoted from the Go lang blog.


I've used the race condition detector on my code, and found warnings about locations that surprised me. See the following code example:



var count int

func main() {
go update()
for {
fmt.Println(count)
time.Sleep(time.Second)
}
}

func update() {
for {
time.Sleep(time.Second)
count++
}
}



I got a warning about the count global variable, and the reason is that:

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

        quoted from the Go lang site.


But then, I thought to myself: "I don't care if I get an outdated value; I do not need a 100% accuracy here. I only want to get an update sometime, and I don't want to spend CPU time on a synchronization mutex in such a case".


So I had posted a question in StackOverflow, and it seemed to annoy some people, but all I wanted is to understand if this is indeed a bug, or am I just going to get outdated values. And the answer I got from everyone is that this is a bug. But I could not understand why.

They claimed that the code might not just get outdated values, but it can also crash, and do anything unexpected. 


So I decided to run some tests, and finally got to this version, where I've increased the speed of the update() Go routine, and let the main() print the status once in a second.



var count int32

func main() {
go update()

lastPrinted := time.Now()
for {
now := time.Now()
if now.Sub(lastPrinted) > time.Second {
fmt.Printf("count %v\n", count)
lastPrinted = now
}
}
}

func update() {
for {
count++
}
}


Now, the output is ALWAYS:



count 0
count 0
...  
  


And now I am a true believer that the race detector reports should never be ignored...