run KISS: December 2021

Thursday, December 30, 2021

Remove Cilium from AWS EKS

Cilium is a great too, but removing it might be an issue.

To remove cilium from AWS EKS, do the following:

1. Uninstall cilium chart:

helm delete cilium   --namespace kube-system

2. Use node-shell to remove cilium CNI on each node:

curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
 chmod +x ./kubectl-node_shell
sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell

kubectl get nodes | awk '{print "kubectl node-shell "  $1 " -- rm -f /etc/cni/net.d/05-cilium.conf&" }' >x
chmod +x x; ./x; rm x

3. Reinstall AWS node CNI

kubectl apply -n kube-system -f aws-node.yaml

The aws-node is located here:

https://github.com/aws/amazon-vpc-cni-k8s

The list of aws-node is available here:

https://github.com/aws/amazon-vpc-cni-k8s/tree/master/config

Wednesday, December 22, 2021

Using Random Forest in Python

image from https://en.wikipedia.org/wiki/Random_forest

In this post we will review usage of a random forest classifier in python.

We use a very simple CSV as input. In real life you will have many columns, and complex data.

height,weight,person
80,40,child
70,30,child
50,10,child
180,80,adult
170,80,adult
185,80,adult

First we load the CSV to a data frame, and print its head.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

df = pd.read_csv("input.csv")
print(df.head(5))

The random forest works with floats, both on features, and on labels. Hence we convert the person column to an int label:

def convert_to_int(row):
    if row['person'] == 'adult':
        return 1
    return 0


df['is_adult'] = df.apply(lambda row: convert_to_int(row), axis=1)
df.drop(labels=['person'], axis=1, inplace=True)

Next we split the data to training and testing segments:

labels = np.array(df['is_adult'])
features = df.drop('is_adult', axis=1)
feature_list = list(features.columns)
features = np.array(features)
train_features, test_features, train_labels, test_labels = \
    train_test_split(features,
                     labels,
                     test_size=0.25,
                     random_state=42,
                     )
print('features shape {} labels shape {}'.format(
    train_features.shape, train_labels.shape))
print('features shape {} labels shape {}'.format(
    test_features.shape, test_labels.shape))

with np.printoptions(threshold=np.inf):
    print(train_features)
    print(train_labels)

Let's examine a dummy model as a baseline. This model always guess that we have a child, and not an adult.

baseline_predictions = np.full(test_labels.shape, 0)
baseline_errors = abs(baseline_predictions - test_labels)

with np.printoptions(threshold=np.inf):
    print("baseline predictions", baseline_predictions)
    print("baseline errors",baseline_errors)

print('error baseline {}'.format(
    round(np.mean(baseline_errors), 3)))

Now let create the random forest classifier, and check its error rate.

forest = RandomForestRegressor(n_estimators=1000, random_state=42)
forest.fit(train_features, train_labels)

predictions = forest.predict(test_features)

prediction_threshold = 0.5
predictions[predictions < prediction_threshold] = 0
predictions[predictions >= prediction_threshold] = 1
with np.printoptions(threshold=np.inf):
    print(predictions)

prediction_errors = predictions - test_labels
print('error for test {}'.format(
    round(np.mean(abs(prediction_errors)), 3), 'degrees.'))

We can check the importance of each feature in the model:

importances = list(forest.feature_importances_)
feature_importances = [(feature, round(importance, 2)) for feature, importance in
                       zip(feature_list, importances)]
feature_importances = sorted(feature_importances, key=lambda x: x[1], reverse=True)
for pair in feature_importances:
    print('variable: {} Importance: {}'.format(*pair))

Lastly, we can examine true/false positive/negative rate:

joined = np.stack((predictions, test_labels), axis=1)
tp = joined[np.where(
    (joined[:, 0] == 1) *
    (joined[:, 1] == 1)
)]
tn = joined[np.where(
    (joined[:, 0] == 0) *
    (joined[:, 1] == 0)
)]
fp = joined[np.where(
    (joined[:, 0] == 1) *
    (joined[:, 1] == 0)
)]
fn = joined[np.where(
    (joined[:, 0] == 0) *
    (joined[:, 1] == 1)
)]
print('true positive {}'.format(np.shape(tp)[0]))
print('true negative {}'.format(np.shape(tn)[0]))
print('false positive {}'.format(np.shape(fp)[0]))
print('false negative {}'.format(np.shape(fn)[0]))

Monday, December 13, 2021

Create your Bash Completion

In this post we will review how to create a bash completion for your project scripts.

As our project grows we add more and more scripts that a developer can manually execute for various development operations. Some of these script require arguments and a bash completion could highly assist the develop, especially when this scripts are often used, and when the argument are long.

My recommendation for this is to add a single bash completion script as part of the project GIT repository. This script should be executed in the bash RC file: ~/.bashrc , for example:

source /home/foo/git/my-project/bash_completion.sh

To provide a static auto-complete, we use a list of values. The following example is a completion for a script that gets the environment name as an argument.

complete -W "dev staging prod" ./examine-env.sh

In other cases the arguments values are dynamic, for example, we have a parent folder under which we have a folder for each microservice. We have many script that receive a name of a microservice. So we want a dynamic completion with the service name. This is done using the following:

function service_completion(){
  if [ "${#COMP_WORDS[@]}" != "2" ]; then
    return
  fi

  local suggestions=($(compgen -W "$(ls ./services-parent-folder | sed 's/\t/ /')" -- "${COMP_WORDS[1]}"))

  if [ "${#suggestions[@]}" == "1" ]; then
    local value=$(echo ${suggestions[0]/%\ */})
    COMPREPLY=("$value")
  else
    COMPREPLY=("${suggestions[@]}")
  fi
}


complete -F service_completion ./build-micro-service.sh

Notice that the auto completion is for an actual used working directory. In case running from another folder, and instead of running ./examine-env.sh you would use ./foo/examine-env.sh , the auto completion would not be invoked.

Wednesday, December 8, 2021

Locate Origin of Kubernetes Pods using Go

In this post we will find for each running pod, the source deployment or statefulset that caused it to run. This is required if we want to show this information in a nice table, or or we want to get additional information about the pod from the source deployment or statefulset.

We start by initiating the kubernetes client. See this post for information about methods to create the kubernetes client.

package main

import (
 "context"
 "fmt"
 metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 "k8s.io/client-go/kubernetes"
 "k8s.io/client-go/tools/clientcmd"
 "os"
 "path/filepath"
)

func main() {
 configPath := filepath.Join(os.Getenv("HOME"), ".kube", "config")
 restConfig, err := clientcmd.BuildConfigFromFlags("", configPath)
 if err != nil {
  panic(err)
 }

 k8sClient, err := kubernetes.NewForConfig(restConfig)
 if err != nil {
  panic(err)
 }

Next we want to fill up a map of owners. This is a map of statefulsets and deployments that caused a pod to start. Notice that deployments are actually starting a replicaset, so we add the replicaset name to point to the original deployment.

owners := make(map[string]string)

listOptions := metav1.ListOptions{}
namespace := "default"

statefulsets, err := k8sClient.AppsV1().StatefulSets(namespace).List(context.Background(), listOptions)
if err != nil {
 panic(err)
}
for _, statefulSet := range statefulsets.Items {
 owners[statefulSet.Name] = fmt.Sprintf("statefulset %v", statefulSet.Name)
}

deployments, err := k8sClient.AppsV1().Deployments(namespace).List(context.Background(), listOptions)
if err != nil {
 panic(err)
}

for _, deployment := range deployments.Items {
 owners[deployment.Name] = fmt.Sprintf("deployment %v", deployment.Name)
}

replicasets, err := k8sClient.AppsV1().ReplicaSets(namespace).List(context.Background(), listOptions)
if err != nil {
 panic(err)
}

for _, replica := range replicasets.Items {
 for _, owner := range replica.OwnerReferences {
  deployment := owners[owner.Name]
  owners[replica.Name] = deployment
 }
}

Having the owners map populated, we can now scan the pods, and print the owner for each pod.

pods, err := k8sClient.CoreV1().Pods(namespace).List(context.Background(), listOptions)
if err != nil {
 panic(err)
}

for _, pod := range pods.Items {
 for _, owner := range pod.OwnerReferences {
  parent := owners[owner.Name]
  fmt.Printf("pod %v owner %v\n", pod.Name, parent)
 }
}

And an example output is:

pod sample-hackazon-deployment-546f47b8cb-j4j7x owner deployment sample-hackazon-deployment

pod sample-hackazon-mysql-deployment-bd6465f75-m4sgc owner deployment sample-hackazon-mysql-deployment

pod sample-onepro-deployment-7669d59cc4-k8g8v owner deployment sample-onepro-deployment

pod sample-onepro-nginx-deployment-7669dd8d46-8fxlw owner deployment sample-onepro-nginx-deployment

pod udger-statefulset-0 owner statefulset udger-statefulset

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE