run KISS: February 2026

Monday, February 23, 2026

Add Grafana Alert for Restarting Pod

In this post we will configure an alert in grafana for restarting pods.

This relates to grafana version 11.5.1. Other version might have a different syntax.

Follow the next steps:

Open grafana GUI
Click on Alerting, Alert rules
Click on New alert rule
Enter rule name: restarting pods
Select prometheus as the data source
Make sure you have kube-state-metrics installed on the kubernetes cluster. In case it is not, install using:

helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update
helm install kube-state-metrics kube-state-metrics/kube-state-metrics \
--namespace kube-system \
--create-namespace
Use the following PromQL query: sum by (namespace, pod) ( increase(kube_pod_container_status_restarts_total[5m]) ) > 1
Leave the threshold as: A > 0
Under Configure no data and error handling, set the Alert state if no data or all values are null to Normal
In the email notification message, use the following summary:

Pod {{ $labels.pod }} restarted
In the email description use the following text:

Pod {{ $labels.pod }} in namespace {{ $labels.namespace }}
restarted more than once in the last 5 minutes.
Current value: {{ $values.A.Value }}
Make sure to configure email provider such as SendGrid to enable sending emails from grafana

Monday, February 9, 2026

Avoiding Huge Docker Image For LLM Model

In this post we will review a method to reduce docker image size for an LLM inference image.

In general using LLM in a docker container tend to create images whose size might be over 20G.

Downloaded docker images from ECR or from NEXUS might be very long. In addition, saving images in these technologies might present a challenge.

To avoid this we use the following:

1. We extract the high disk space usage folders out from the docker image, and save it into S3. Then we create a new slim image without these folders.

2. Upon deployment, we run an init container to download the file from S3, and extract it to an emptyDir volume.

Step 1: Extract folders and create slim image

The following script runs the original huge image build.

Then it extracts the huge folders ExportFolder1, ExportFolder2 into the OutputFile. This file should be loaded in AWS S3.

Next the script creates a new slim image without the folders ExportFolder1, ExportFolder2.

#!/usr/bin/env bash
set -e

cd "$(dirname "$0")"

DockerRegistry=my-registry.my-company
ProjectVersion=latest
BuildNumber=1234

OutputFile="${OutputFile:-llm-image-${BuildNumber}.tar.gz}"
SlimImageName="${DockerRegistry}/llm-image-slim${ProjectVersion}"

ExportFolder1="root/.cache"
ExportFolder2="usr/local/lib/python3.12"

TempFolder="$(mktemp -d)"
ImageFileSystem="${TempFolder}/rootfs"
TempContainer=llm-image-tmp

date
echo "build full image"
./build.sh


date
echo "extracting container to ${ImageFileSystem}"
mkdir -p "${ImageFileSystem}"
docker rm -f ${TempContainer}
docker create --name ${TempContainer} llm-image/dev:latest
docker export ${TempContainer} | tar -x -C "${ImageFileSystem}"
docker rm -f ${TempContainer}

date
echo "compressing folders to ${OutputFile}"
tar -czf "${OutputFile}" -C "${ImageFileSystem}"  "${ExportFolder1}"  "${ExportFolder2}"

date
echo "creating slim image ${SlimImageName}"
rm -rf "${ImageFileSystem}/${ExportFolder1}" "${ImageFileSystem}/${ExportFolder2}"
tar -C "${ImageFileSystem}" -c . | docker import - "${SlimImageName}"
docker push ${SlimImageName}

date
echo "cleaning up"
rm -rf "${TempFolder}"

date
echo "Done"

Step 2: Deployment

To handle the S3 download we create an extractor image.

Dockerfile:

FROM amazon/aws-cli


COPY files /

ENTRYPOINT ["/entrypoint.sh"]

Downloader script: entrypoint.sh

#!/usr/bin/env bash
set -e

localFileName="/models-local.tar.gz"

cd /local-storage

echo "Downloading ${MODELS_S3_PATH}"
aws s3 cp ${MODELS_S3_PATH} ${localFileName}

echo "Extracting ${localFileName}"
tar -xzf ${localFileName}

echo "Cleaning up..."
rm ${localFileName}

In the kubernetes deployment file, we add an init container.

initContainers:
  - name: extract
    image: modelsextract
    env:
      - name: MODELS_S3_PATH
        value: {{ .Values.modelsS3Path | quote }}
    volumeMounts:
      - mountPath: /local-storage
        name: local-storage

An emptyDir volume and mount for the exported folders:

volumes:
  - name: local-storage
    emptyDir:
      sizeLimit: 100Gi

volumeMounts:
  - mountPath: /root/.cache
    subPath: root/.cache
    name: local-storage
  - mountPath: /usr/local/lib/python3.12
    subPath: usr/local/lib/python3.12
    name: local-storage

Also, make sure to use the slim image instead of the huge image in the deployment container.

Final Note

We have reviewed creation of a slim docker image that later downloads the extra data from AWS S3. To save cost and gain speed, the actual download should be done in the same region as the deployment, so make sure to copy it the a relevant regional AWS S3 bucket.

Wednesday, February 4, 2026

Using DuckDB in GO

In this post we show a simple example of using DuckDB in GO.

DuckDB features:

Can run in-memory
Supports multiple files types, such as : JSON, NDJSON, CSV EXCEL
Supports multiple storage types, such as: AWS S3, GCP Storage, PostgreSQL

In the following example we store our data in parquet files in AWS S3.

We order the file using a folders structure that would later enable use to fetch only the files we need.

package duckdb

import (
  "database/sql"
  "fmt"
  "testing"
  "time"

  _ "github.com/marcboeker/go-duckdb"
)

func TestValidation(_ *testing.T) {
  db, err := sql.Open("duckdb", "")
  kiterr.RaiseIfError(err)
  defer db.Close()

  _, err = db.Exec(`
  INSTALL httpfs;
  LOAD httpfs;
`)
  kiterr.RaiseIfError(err)

  _, err = db.Exec(`
  SET s3_region='us-east-1';
  SET s3_access_key_id='XXX';
  SET s3_secret_access_key='XXX';
`)
  kiterr.RaiseIfError(err)

  createTable(db)
  createData(db)
  exportData(db)
}

func createTable(
  db *sql.DB,
) {
  _, err := db.Exec(`
  CREATE TABLE IF NOT EXISTS events (
   my_text TEXT,
   my_value INTEGER,
   event_time TIMESTAMP
  );
`)
  kiterr.RaiseIfError(err)
}

func createData(
  db *sql.DB,
) {
  statement, err := db.Prepare(`
   INSERT INTO events (my_text, my_value, event_time)
   VALUES (?, ?, ?)
  `)
  kiterr.RaiseIfError(err)

  startTime := kittime.ParseDay("2000-01-01")
  for day := range 2 {
   for hour := range 24 {
    for dataIndex := range 2 {
     dayTime := startTime.Add(time.Duration(day) * 24 * time.Hour)
     hourTime := dayTime.Add(time.Duration(hour) * time.Hour)
     fmt.Printf("saving data %v for %v\n", dataIndex, kittime.NiceTime(&hourTime))
     _, err = statement.Exec("aaa", 11, &hourTime)
     kiterr.RaiseIfError(err)
    }
   }
  }
}

func exportData(
  db *sql.DB,
) {
  _, err := db.Exec(`
  COPY (
   SELECT
    my_text,
    my_value,
    event_time,
    CAST(event_time AS DATE) AS event_date,
    strftime(event_time , '%H') AS event_hour
   FROM events
  )
  TO 's3://my-bucket/duck/'
  (
   FORMAT PARQUET,
   COMPRESSION GZIP,
   PARTITION_BY (event_date, event_hour)
  );
`)
  kiterr.RaiseIfError(err)
}

The result in AWS S3 is:

$ aws s3 ls --recursive s3://my-bucket
2026-02-04 11:27:40        542 duck/event_date=2000-01-02/event_hour=10/data_0.parquet
2026-02-04 11:27:45        541 duck/event_date=2000-01-02/event_hour=13/data_0.parquet
2026-02-04 11:27:47        541 duck/event_date=2000-01-02/event_hour=14/data_0.parquet
2026-02-04 11:27:49        542 duck/event_date=2000-01-02/event_hour=15/data_0.parquet
2026-02-04 11:27:50        541 duck/event_date=2000-01-02/event_hour=16/data_0.parquet
2026-02-04 11:27:52        542 duck/event_date=2000-01-02/event_hour=17/data_0.parquet
2026-02-04 11:27:54        541 duck/event_date=2000-01-02/event_hour=18/data_0.parquet
2026-02-04 11:27:58        541 duck/event_date=2000-01-02/event_hour=20/data_0.parquet
2026-02-04 11:28:00        542 duck/event_date=2000-01-02/event_hour=21/data_0.parquet
2026-02-04 11:28:03        542 duck/event_date=2000-01-02/event_hour=23/data_0.parquet

So for a quick start project that needs to store data overtime, and read it up need it is nice solution. For a heavier project that requires more data with high control over the storage type, parallelism, and timing, it might require creation of proprietary code.

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE