Saturday, January 15, 2022

Using Let's Encrypt in GKE Ingress


In this post we will review the steps to create a signed SSL certificate for our site, running in Google Kubernetes Engine (aka GKE) and using Ingress to handle the incoming traffic.

The SSL certificate is created by the Let's Encrypt service, which will automatically, and free of charge supplies the signing service, and renewal service, based on the ownership of the domain in the DNS. This works only if traffic the the related domain that we are signing is sent to the GKE ingress, hence assuring that the SSL certificate requester is indeed authentic.

In the following example, we wee present signing of 2 FQDNs:,

After connecting to the GKE cluster, use helm to install cert manager, which will manage the SSL certificate creation and renewals.

kubectl apply -f
helm repo add jetstack
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.2.0
kubectl get pods --namespace cert-manager

The following steps include 2 parts. First we use a test/staging let's encrypt service to ensure that the integration with let's encrypt indeed works well. Only then we move to the production let's encrypt service. This is since the production let's encrypt service has some limits on the requests amounts, and the certificate singing is slower.

Using Staging Let's Encrypt

First we create a test domain certificate.

cat <<EOF > test-resources.yaml
apiVersion: v1
kind: Namespace
name: cert-manager-test
kind: Issuer
name: test-selfsigned
namespace: cert-manager-test
selfSigned: {}
kind: Certificate
name: selfsigned-cert
namespace: cert-manager-test
secretName: selfsigned-cert-tls
name: test-selfsigned

kubectl apply -f test-resources.yaml

And check that it is working without errors using the command:

kubectl describe certificate -n cert-manager-test

And once we see all is working fine, remove the test domain certificate:

kubectl delete -f test-resources.yaml

Now, let  move to the actual staging issuer:

cat <<EOF > clusterissuer.yaml
kind: ClusterIssuer
name: letsencrypt-staging
name: letsencrypt-staging
- http01:
class: ingress-gce

kubectl apply -f clusterissuer.yaml

and use the following command to check that the issuer is ready:

kubectl describe clusterissuer letsencrypt-staging

And next update the ingress to use the certificate manager, by adding the annotations, and update the tls section.

annotations: letsencrypt-staging "true"
- secretName: ingress-secret-letsencrypt

Use the following commands to following the certificate sign process:

kubectl get certificate
kubectl describe certificate ingress-secret-letsencrypt
kubectl describe secret ingress-secret-letsencrypt

Once the process is done, wait for the ingress related load balancer to be updated. It would take about 15 minutes. Then connecting to the domain, we still get an invalid certificate error, since it was signed by the staging service, but when viewing the certificate details, we can see that it was signed by the staging service, and this indicates that we can move to the next step, to use the production service.

Move to Production Let's Encrypt

As we've done before, we create a issuer, but this time for the production service.

cat <<EOF > clusterissuer.yaml
kind: ClusterIssuer
name: letsencrypt-prod
name: letsencrypt-prod
- http01:
class: ingress-gce

kubectl apply -f clusterissuer.yaml
kubectl delete secret ingress-secret-letsencrypt

And track the progress using the commands:

kubectl describe clusterissuer letsencrypt-prod
kubectl get order

Next, we should wait again for the ingress load balancer to update, for another ~15 minutes, and then check that the certificate is indeed valid.

Final Note

In case of need, see also how to start with a self signed certificate GKE ingress in the following post.

Monday, January 10, 2022

Print Git History Changes in Google Cloud Build


In previous posts we have reviewed how to use google cloud build, and how to manage dependencies of multiple build triggers. In this post we will review how can we print the list of git changes since the last successful build.

Our build is running a shell, using the predefined google builder image, and is running a shell file. We send the current build git hash as an argument to the shell. Hence the trigger configuration is the following:

name: build

branchName: .*
projectId: my-project
repoName: my-repo

timeout: 3600s
- id: main
entrypoint: bash
timeout: 3600s
- /

The file gets the arguments:


and includes the following:


function printGitLog() {
set +e
gsutil -q stat gs://my-google-storge-bucket/${shaFile}
set -e
if [[ "${rc}" == "0" ]]; then
echo "=== changes from last successful build ==="
gsutil cp gs://my-google-storge-bucket/${shaFile} ./${shaFile}
prevSha=`cat ${shaFile}`
git fetch --depth=100

# do not fail the build if we cannot find the range in the last 100 commits
set +e
git log ${prevSha}..${commitSha}
set -e
echo "=========================================="
echo "no previous sha file"

function saveGitSha() {
echo "Saving build sha to GCP storage"
echo ${commitSha} > ${shaFile}_temp
tr -d '\t\n ' < ${shaFile}_temp > ${shaFile}
rm -f ${shaFile}_temp

gsutil cp ${shaFile} gs://my-google-storge-bucket

We should call the saveGitSha function upon successful build. This would save the build hash to a file in google storage (you should manually create this folder in google storage). 

Also, upon build start we call to printGitLog, which retrieve the last successful hash from the folder in the google storage, fetches the git history, and then prints the changes between the current build hash to the last successful hash.

Monday, January 3, 2022

Installing Kubernetes on your Development Machine


I hear many developer that are using minikube to run kubernetes on their development machine.

Working on your local kubernetes during development stage can in many cases shorten development life-cycle, due to ability to quickly observe end to end behavior of the system. However, using minikube does not seems like the right way to do it. Minikube use a VM running on the development machine, hence the performance of the kubernetes cluster, and the load on the development machine are high.

Instead of using minikube, a good alternative is to install a bar metal kubernetes on the development machine. It might talk a bit longer to install, but the simplicity and performance of it sure make this method way better.

To install the kube* binaries on the development machine use the following script:

echo "Update the apt"
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

echo "Download the GCP signing key"
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg

echo "Add the Kubernetes apt repo"
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

echo "Update kube* binaries"
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

To install kubernetes on the development machine, we can use the following script:

# Install cluster
sudo rm -rf $HOME/.kube
sudo rm -rf /var/lib/etcd
sudo kubeadm init --pod-network-cidr=
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl taint nodes --all

# Install calico
curl -O
kubectl apply -f calico.yaml
rm -f calico.yaml

If we're already using a script, why not install other requirements as well, such as Helm:

# Install Helm
rm -rf ~/.helm
sudo rm -rf /usr/local/bin/helm
curl >
chmod 700
rm ./

And you can go on and add other requirements for your application as part of this cluster, such as metrics server, or anything you might require on the kubernetes cluster.

Thursday, December 30, 2021

Remove Cilium from AWS EKS


Cilium is a great too, but removing it might be an issue.

To remove cilium from AWS EKS, do the following:

1. Uninstall cilium chart:

helm delete cilium   --namespace kube-system 

2. Use node-shell to remove cilium CNI on each node:

curl -LO chmod +x ./kubectl-node_shellsudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shellkubectl get nodes | awk '{print "kubectl node-shell "  $1 " -- rm -f /etc/cni/net.d/05-cilium.conf&" }' >xchmod +x x; ./x; rm x

3. Reinstall AWS node CNI

kubectl apply -n kube-system -f aws-node.yaml

The aws-node is located here:
The list of aws-node is available here:

Wednesday, December 22, 2021

Using Random Forest in Python


image from

In this post we will review usage of a random forest classifier in python.

We use a very simple CSV as input. In real life you will have many columns, and complex data.


First we load the CSV to a data frame, and print its head.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

df = pd.read_csv("input.csv")

The random forest works with floats, both on features, and on labels. Hence we convert the person column to an int label:

def convert_to_int(row):
if row['person'] == 'adult':
return 1
return 0

df['is_adult'] = df.apply(lambda row: convert_to_int(row), axis=1)
df.drop(labels=['person'], axis=1, inplace=True)

Next we split the data to training and testing segments:

labels = np.array(df['is_adult'])
features = df.drop('is_adult', axis=1)
feature_list = list(features.columns)
features = np.array(features)
train_features, test_features, train_labels, test_labels = \
print('features shape {} labels shape {}'.format(
train_features.shape, train_labels.shape))
print('features shape {} labels shape {}'.format(
test_features.shape, test_labels.shape))

with np.printoptions(threshold=np.inf):

Let's examine a dummy model as a baseline. This model always guess that we have a child, and not an adult.

baseline_predictions = np.full(test_labels.shape, 0)
baseline_errors = abs(baseline_predictions - test_labels)

with np.printoptions(threshold=np.inf):
print("baseline predictions", baseline_predictions)
print("baseline errors",baseline_errors)

print('error baseline {}'.format(
round(np.mean(baseline_errors), 3)))

Now let create the random forest classifier, and check its error rate.

forest = RandomForestRegressor(n_estimators=1000, random_state=42), train_labels)

predictions = forest.predict(test_features)

prediction_threshold = 0.5
predictions[predictions < prediction_threshold] = 0
predictions[predictions >= prediction_threshold] = 1
with np.printoptions(threshold=np.inf):

prediction_errors = predictions - test_labels
print('error for test {}'.format(
round(np.mean(abs(prediction_errors)), 3), 'degrees.'))

We can check the importance of each feature in the model:

importances = list(forest.feature_importances_)
feature_importances = [(feature, round(importance, 2)) for feature, importance in
zip(feature_list, importances)]
feature_importances = sorted(feature_importances, key=lambda x: x[1], reverse=True)
for pair in feature_importances:
print('variable: {} Importance: {}'.format(*pair))

Lastly, we can examine true/false positive/negative rate:

joined = np.stack((predictions, test_labels), axis=1)
tp = joined[np.where(
(joined[:, 0] == 1) *
(joined[:, 1] == 1)
tn = joined[np.where(
(joined[:, 0] == 0) *
(joined[:, 1] == 0)
fp = joined[np.where(
(joined[:, 0] == 1) *
(joined[:, 1] == 0)
fn = joined[np.where(
(joined[:, 0] == 0) *
(joined[:, 1] == 1)
print('true positive {}'.format(np.shape(tp)[0]))
print('true negative {}'.format(np.shape(tn)[0]))
print('false positive {}'.format(np.shape(fp)[0]))
print('false negative {}'.format(np.shape(fn)[0]))

Monday, December 13, 2021

Create your Bash Completion


In this post we will review how to create a bash completion for your project scripts.

As our project grows we add more and more scripts that a developer can manually execute for various development operations. Some of these script require arguments and a bash completion could highly assist the develop, especially when this scripts are often used, and when the argument are long.

My recommendation for this is to add a single bash completion script as part of the project GIT repository. This script should be executed in the bash RC file: ~/.bashrc , for example:

source /home/foo/git/my-project/

To provide a static auto-complete, we use a list of values. The following example is a completion for a script that gets the environment name as an argument. 

complete -W "dev staging prod" ./

In other cases the arguments values are dynamic, for example, we have a parent folder under which we have a folder for each microservice. We have many script that receive a name of a microservice. So we want a dynamic completion with the service name. This is done using the following:

function service_completion(){
if [ "${#COMP_WORDS[@]}" != "2" ]; then

local suggestions=($(compgen -W "$(ls ./services-parent-folder | sed 's/\t/ /')" -- "${COMP_WORDS[1]}"))

if [ "${#suggestions[@]}" == "1" ]; then
local value=$(echo ${suggestions[0]/%\ */})

complete -F service_completion ./

Notice that the auto completion is for an actual used working directory. In case running from another folder, and instead of running ./ you would use ./foo/ , the auto completion would not be invoked.

Wednesday, December 8, 2021

Locate Origin of Kubernetes Pods using Go


In this post we will find for each running pod, the source deployment or statefulset that caused it to run. This is required if we want to show this information in a nice table, or or we want to get additional information about the pod from the source deployment or statefulset.

We start by initiating the kubernetes client. See this post for information about methods to create the kubernetes client.

package main

import (
metav1 ""

func main() {
configPath := filepath.Join(os.Getenv("HOME"), ".kube", "config")
restConfig, err := clientcmd.BuildConfigFromFlags("", configPath)
if err != nil {

k8sClient, err := kubernetes.NewForConfig(restConfig)
if err != nil {

Next we want to fill up a map of owners. This is a map of statefulsets and deployments that caused a pod to start. Notice that deployments are actually starting a replicaset, so we add the replicaset name to point to the original deployment.

owners := make(map[string]string)

listOptions := metav1.ListOptions{}
namespace := "default"

statefulsets, err := k8sClient.AppsV1().StatefulSets(namespace).List(context.Background(), listOptions)
if err != nil {
for _, statefulSet := range statefulsets.Items {
owners[statefulSet.Name] = fmt.Sprintf("statefulset %v", statefulSet.Name)

deployments, err := k8sClient.AppsV1().Deployments(namespace).List(context.Background(), listOptions)
if err != nil {

for _, deployment := range deployments.Items {
owners[deployment.Name] = fmt.Sprintf("deployment %v", deployment.Name)

replicasets, err := k8sClient.AppsV1().ReplicaSets(namespace).List(context.Background(), listOptions)
if err != nil {

for _, replica := range replicasets.Items {
for _, owner := range replica.OwnerReferences {
deployment := owners[owner.Name]
owners[replica.Name] = deployment

Having the owners map populated, we can now scan the pods, and print the owner for each pod.

pods, err := k8sClient.CoreV1().Pods(namespace).List(context.Background(), listOptions)
if err != nil {

for _, pod := range pods.Items {
for _, owner := range pod.OwnerReferences {
parent := owners[owner.Name]
fmt.Printf("pod %v owner %v\n", pod.Name, parent)

And an example output is:

pod sample-hackazon-deployment-546f47b8cb-j4j7x owner deployment sample-hackazon-deployment

pod sample-hackazon-mysql-deployment-bd6465f75-m4sgc owner deployment sample-hackazon-mysql-deployment

pod sample-onepro-deployment-7669d59cc4-k8g8v owner deployment sample-onepro-deployment

pod sample-onepro-nginx-deployment-7669dd8d46-8fxlw owner deployment sample-onepro-nginx-deployment

pod udger-statefulset-0 owner statefulset udger-statefulset