Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Monday, March 16, 2026

Deploy ClickHouse Cluster on Kubernetes without Operator


 


In a recent post we've seen how to deploy ClickHouse standalone on kubernetes without operator. In this post we will further show deployment on kubernetes without operator, but this time we will use clickhouse cluster.

To install a cluster, we need to deploy 2 deployments: clickhouse keeper that is used to maintain a quorum, and the clickhouse server which includes the actual cluster nodes. We will use 3 replicas for the keeper, and 2 replicas for the server.


ClickHouse Keeper

The keeper includes the entities below.


The service

apiVersion: v1
kind: Service
metadata:
name: clickhousekeeper-service
spec:
selector:
configid: clickhousekeeper-container
clusterIP: None
ports:
- name: client
port: 9181
- name: raft
port: 9444


The config

apiVersion: v1
kind: ConfigMap
metadata:
name: clickhousekeeper-config
data:
keeper_config.xml: |-
<clickhouse>

<logger>
<level>information</level>
<console>true</console>
</logger>

<listen_host>0.0.0.0</listen_host>

<keeper_server>
<tcp_port>9181</tcp_port>

<server_id>REPLACE_ME_SERVER_ID</server_id>

<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

<raft_configuration>
<server>
<id>1</id>
<hostname>clickhousekeeper-statefulset-0.clickhousekeeper-service.default.svc.cluster.local</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>clickhousekeeper-statefulset-1.clickhousekeeper-service.default.svc.cluster.local</hostname>
<port>9444</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>


The statefuleset

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clickhousekeeper-statefulset
spec:
serviceName: clickhousekeeper-service
podManagementPolicy: Parallel
replicas: 3
selector:
matchLabels:
configid: clickhousekeeper-container
template:
metadata:
labels:
configid: clickhousekeeper-container
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: configid
operator: In
values:
- clickhousekeeper-container

initContainers:
- name: config
image: clickhouse/clickhouse-keeper:latest
imagePullPolicy: Always
command:
- bash
- -c
- |
set -e
POD_ORDINAL=$(echo $HOSTNAME | awk -F'-' '{print $NF}')
cp /config-origin/keeper_config.xml /etc/clickhouse-keeper/keeper_config.xml
sed -i "s/REPLACE_ME_SERVER_ID/$((POD_ORDINAL+1))/" /etc/clickhouse-keeper/keeper_config.xml
cat /etc/clickhouse-keeper/keeper_config.xml
volumeMounts:
- name: clickhousekeeper-config
mountPath: /config-origin
- name: clickhousekeeper-runtime-config
mountPath: /etc/clickhouse-keeper
containers:
- name: clickhousekeeper
image: clickhouse/clickhouse-keeper:latest
imagePullPolicy: Always
ports:
- containerPort: 9181
name: client
- containerPort: 9444
name: raft
volumeMounts:
- name: pvc
mountPath: /var/lib/clickhouse
- name: clickhousekeeper-runtime-config
mountPath: /etc/clickhouse-keeper

volumes:
- name: clickhousekeeper-config
configMap:
name: clickhousekeeper-config
- name: clickhousekeeper-runtime-config
emptyDir: {}

volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp2"
resources:
requests:
storage: 10Gi


Notice the config map is saved to a side folder, and updated with the related ID by the pod name.


ClickHouse Server

The server includes the entities below.


The service

apiVersion: v1
kind: Service
metadata:
name: clickhouseserver-service
spec:
selector:
configid: clickhouseserver-container
clusterIP: None
ports:
- name: tcp
port: 9000
- name: http
port: 8123


The config

apiVersion: v1
kind: ConfigMap
metadata:
name: clickhouseserver-config
data:
custom.xml: |-
<clickhouse>
<logger>
<level>information</level>
<console>true</console>
</logger>

<listen_host>0.0.0.0</listen_host>

<keeper_server>
<tcp_port>9181</tcp_port>
</keeper_server>

<zookeeper>
<node>
<host>clickhousekeeper-statefulset-0.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
<node>
<host>clickhousekeeper-statefulset-1.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
<node>
<host>clickhousekeeper-statefulset-2.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
</zookeeper>

<remote_servers>
<llmp_clickhouse_cluster>
<shard>
<replica>
<host>clickhouseserver-statefulset-0.clickhouseserver-service.default.svc.cluster.local</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouseserver-statefulset-1.clickhouseserver-service.default.svc.cluster.local</host>
<port>9000</port>
</replica>
</shard>
</llmp_clickhouse_cluster>
</remote_servers>
</clickhouse>
users.xml: |-
<clickhouse>
<users>
<default>
<password></password>
<networks>
<ip>::/0</ip>
</networks>
</default>
</users>
</clickhouse>
macros.xml: |-
<clickhouse>
<macros>
</macros>
</clickhouse>


The statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clickhouseserver-statefulset
spec:
serviceName: clickhouseserver-service
podManagementPolicy: Parallel
replicas: 2
selector:
matchLabels:
configid: clickhouseserver-container
template:
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: configid
operator: In
values:
- clickhouseserver-container

initContainers:
- name: config
image: lclickhouse/clickhouse-server:latest
imagePullPolicy: Always
command:
- bash
- -c
- |
set -e
cp /config-origin/custom.xml /etc/clickhouse-server/config.d/custom.xml
cp /config-origin/macros.xml /etc/clickhouse-server/config.d/macros.xml
sed -i "s/REPLACE_ME_HOST_NAME/${HOSTNAME}/" /etc/clickhouse-server/config.d/macros.xml
volumeMounts:
- name: clickhouseserver-config
mountPath: /config-origin
- name: clickhouseserver-runtime-config
mountPath: /etc/clickhouse-server/config.d
containers:
- name: clickhouseserver
image: clickhouse/clickhouse-server:latest
imagePullPolicy: Always
ports:
- containerPort: 9181
name: client
- containerPort: 9444
name: raft
volumeMounts:
- name: pvc
mountPath: /var/lib/clickhouse
- name: clickhouseserver-runtime-config
mountPath: /etc/clickhouse-server/config.d
- name: clickhouseserver-config
mountPath: /etc/clickhouse-server/users.d/users.xml
subPath: users.xml

volumes:
- name: clickhouseserver-config
configMap:
name: clickhouseserver-config
- name: clickhouseserver-runtime-config
emptyDir: {}

volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp2"
resources:
requests:
storage: 10Gi



Here again we update the configuration using an init container.


Final Note

Using clickhouse operator is much simpler than the manual mode, but in some cases specific adaptations are required that block us from using the operator. In this post we've displayed the full requirements for a minimal manual deployment of a clickhouse cluster.



Monday, March 2, 2026

Analyzing JSON and Text using Bash




In this post we will review simple methods to quickly analyze JSON and text data using bash CLIs and pipelines. While these methods might lack some of the abilities that we can find in more complex GO/JavaScript/python code, they have a great advantage of quickly grepping a piece of information we're looking for.

For this post we will use an input file containing list of HTTP requests, for example:












JSON Parsing

This first thing we want to do it to use the `jq` command to get this as a parsed JSON.

cat data.json | jq











Next, lets get only the source IPs from the file:

cat data.json | jq  '.[].SourceIp'








Count and Sort

Let's get unique count of requests per IP sorted by count.


cat data.json | jq '.[].SourceIp' | sort | uniq -c | sort -n




























Text Parsing

Let's get the methods usage in the HTTP request.

cat data.json | jq '.[].HttpRequest' 

















Now split the text by the first space and get only the first part, then remove the first character (the quotes). Then use the previous method to count per method.

cat data.json | jq '.[].HttpRequest' | awk -F' ' '{print $1}' | cut -c2-1000 | sort | uniq -c | sort -n







We can do the same to get only the HTTP path.

cat data.json | jq '.[].HttpRequest' | awk -F' ' '{print $2}' | sort | uniq -c | sort -n












Grep Area

What if we want to get transactions only for a specific IP?

We use the grep -A, -B, -C flags.

-A = get also one line after the located text
-B = get also one line before the located text
-C = get also one line before and after the located text


cat data.json | jq  | grep -B1 46.117.105.66

















Now we analyze the transactions for this IP like we've done before:

cat data.json | jq  | grep -B1 46.117.105.66 | grep HttpRequest | awk -F ' ' '{print $3}' | sort | uniq -c | sort -n










Final Note


In this post we're had a taste of the power of bash piping.
I highly recommend experimenting with these CLIs since you can do in a few seconds things that would otherwise take you much longer.


Monday, February 23, 2026

Add Grafana Alert for Restarting Pod


 


In this post we will configure an alert in grafana for restarting pods.

This relates to grafana version 11.5.1. Other version might have a different syntax.

Follow the next steps:

  • Open grafana GUI

  • Click on Alerting, Alert rules

  • Click on New alert rule

  • Enter rule name: restarting pods

  • Select prometheus as the data source

  • Make sure you have kube-state-metrics installed on the kubernetes cluster. In case it is not, install using:

    helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics

    helm repo update

    helm install kube-state-metrics kube-state-metrics/kube-state-metrics \

      --namespace kube-system \

      --create-namespace


  • Use the following PromQL query: sum by (namespace, pod) ( increase(kube_pod_container_status_restarts_total[5m]) ) > 1
  • Leave the threshold as: A > 0
  • Under Configure no data and error handling, set the Alert state if no data or all values are null to Normal
  • In the email notification message, use the following summary:

    Pod {{ $labels.pod }} restarted

  • In the email description use the following text:

    Pod {{ $labels.pod }} in namespace {{ $labels.namespace }}
    restarted more than once in the last 5 minutes.
    Current value: {{ $values.A.Value }}

  • Make sure to configure email provider such as SendGrid to enable sending emails from grafana


Monday, February 9, 2026

Avoiding Huge Docker Image For LLM Model


 


In this post we will review a method to reduce docker image size for an LLM inference image.

In general using LLM in a docker container tend to create images whose size might be over 20G.

Downloaded docker images from ECR or from NEXUS might be very long. In addition, saving images in these technologies might present a challenge. 

To avoid this we use the following:

1. We extract the high disk space usage folders out from the docker image, and save it into S3. Then we create a new slim image without these folders.

2. Upon deployment, we run an init container to download the file from S3, and extract it to an emptyDir volume.


Step 1: Extract folders and create slim image

The following script runs the original huge image build.

Then it extracts the huge folders ExportFolder1, ExportFolder2 into the OutputFile. This file should be loaded in AWS S3.

Next the script creates a new slim image without the folders ExportFolder1, ExportFolder2.


#!/usr/bin/env bash
set -e

cd "$(dirname "$0")"

DockerRegistry=my-registry.my-company
ProjectVersion=latest
BuildNumber=1234

OutputFile="${OutputFile:-llm-image-${BuildNumber}.tar.gz}"
SlimImageName="${DockerRegistry}/llm-image-slim${ProjectVersion}"

ExportFolder1="root/.cache"
ExportFolder2="usr/local/lib/python3.12"

TempFolder="$(mktemp -d)"
ImageFileSystem="${TempFolder}/rootfs"
TempContainer=llm-image-tmp

date
echo "build full image"
./build.sh


date
echo "extracting container to ${ImageFileSystem}"
mkdir -p "${ImageFileSystem}"
docker rm -f ${TempContainer}
docker create --name ${TempContainer} llm-image/dev:latest
docker export ${TempContainer} | tar -x -C "${ImageFileSystem}"
docker rm -f ${TempContainer}

date
echo "compressing folders to ${OutputFile}"
tar -czf "${OutputFile}" -C "${ImageFileSystem}" "${ExportFolder1}" "${ExportFolder2}"

date
echo "creating slim image ${SlimImageName}"
rm -rf "${ImageFileSystem}/${ExportFolder1}" "${ImageFileSystem}/${ExportFolder2}"
tar -C "${ImageFileSystem}" -c . | docker import - "${SlimImageName}"
docker push ${SlimImageName}

date
echo "cleaning up"
rm -rf "${TempFolder}"

date
echo "Done"



Step 2: Deployment

To handle the S3 download we create an extractor image.


Dockerfile:


FROM amazon/aws-cli


COPY files /

ENTRYPOINT ["/entrypoint.sh"]


Downloader script: entrypoint.sh


#!/usr/bin/env bash
set -e

localFileName="/models-local.tar.gz"

cd /local-storage

echo "Downloading ${MODELS_S3_PATH}"
aws s3 cp ${MODELS_S3_PATH} ${localFileName}

echo "Extracting ${localFileName}"
tar -xzf ${localFileName}

echo "Cleaning up..."
rm ${localFileName}



In the kubernetes deployment file, we add an init container.


initContainers:
- name: extract
image: modelsextract
env:
- name: MODELS_S3_PATH
value: {{ .Values.modelsS3Path | quote }}
volumeMounts:
- mountPath: /local-storage
name: local-storage


An emptyDir volume and mount for the exported folders:


volumes:
- name: local-storage
emptyDir:
sizeLimit: 100Gi


volumeMounts:
- mountPath: /root/.cache
subPath: root/.cache
name: local-storage
- mountPath: /usr/local/lib/python3.12
subPath: usr/local/lib/python3.12
name: local-storage


Also, make sure to use the slim image instead of the huge image in the deployment container.



Final Note

We have reviewed creation of a slim docker image that later downloads the extra data from AWS S3. To save cost and gain speed, the actual download should be done in the same region as the deployment, so make sure to copy it the a relevant regional AWS S3 bucket.


Wednesday, February 4, 2026

Using DuckDB in GO




In this post we show a simple example of using DuckDB in GO.

DuckDB features:

  • Can run in-memory 
  • Supports multiple files types, such as : JSON, NDJSON, CSV EXCEL
  • Supports multiple storage types, such as: AWS S3, GCP Storage, PostgreSQL

In the following example we store our data in parquet files in AWS S3.
We order the file using a folders structure that would later enable use to fetch only the files we need.


package duckdb

import (
"database/sql"
"fmt"
"testing"
"time"

_ "github.com/marcboeker/go-duckdb"
)

func TestValidation(_ *testing.T) {
db, err := sql.Open("duckdb", "")
kiterr.RaiseIfError(err)
defer db.Close()

_, err = db.Exec(`
INSTALL httpfs;
LOAD httpfs;
`)
kiterr.RaiseIfError(err)

_, err = db.Exec(`
SET s3_region='us-east-1';
SET s3_access_key_id='XXX';
SET s3_secret_access_key='XXX';
`)
kiterr.RaiseIfError(err)

createTable(db)
createData(db)
exportData(db)
}

func createTable(
db *sql.DB,
) {
_, err := db.Exec(`
CREATE TABLE IF NOT EXISTS events (
my_text TEXT,
my_value INTEGER,
event_time TIMESTAMP
);
`)
kiterr.RaiseIfError(err)
}

func createData(
db *sql.DB,
) {
statement, err := db.Prepare(`
INSERT INTO events (my_text, my_value, event_time)
VALUES (?, ?, ?)
`)
kiterr.RaiseIfError(err)

startTime := kittime.ParseDay("2000-01-01")
for day := range 2 {
for hour := range 24 {
for dataIndex := range 2 {
dayTime := startTime.Add(time.Duration(day) * 24 * time.Hour)
hourTime := dayTime.Add(time.Duration(hour) * time.Hour)
fmt.Printf("saving data %v for %v\n", dataIndex, kittime.NiceTime(&hourTime))
_, err = statement.Exec("aaa", 11, &hourTime)
kiterr.RaiseIfError(err)
}
}
}
}

func exportData(
db *sql.DB,
) {
_, err := db.Exec(`
COPY (
SELECT
my_text,
my_value,
event_time,
CAST(event_time AS DATE) AS event_date,
strftime(event_time , '%H') AS event_hour
FROM events
)
TO 's3://my-bucket/duck/'
(
FORMAT PARQUET,
COMPRESSION GZIP,
PARTITION_BY (event_date, event_hour)
);
`)
kiterr.RaiseIfError(err)
}


The result in AWS S3 is:


$ aws s3 ls --recursive s3://my-bucket
2026-02-04 11:27:40 542 duck/event_date=2000-01-02/event_hour=10/data_0.parquet
2026-02-04 11:27:45 541 duck/event_date=2000-01-02/event_hour=13/data_0.parquet
2026-02-04 11:27:47 541 duck/event_date=2000-01-02/event_hour=14/data_0.parquet
2026-02-04 11:27:49 542 duck/event_date=2000-01-02/event_hour=15/data_0.parquet
2026-02-04 11:27:50 541 duck/event_date=2000-01-02/event_hour=16/data_0.parquet
2026-02-04 11:27:52 542 duck/event_date=2000-01-02/event_hour=17/data_0.parquet
2026-02-04 11:27:54 541 duck/event_date=2000-01-02/event_hour=18/data_0.parquet
2026-02-04 11:27:58 541 duck/event_date=2000-01-02/event_hour=20/data_0.parquet
2026-02-04 11:28:00 542 duck/event_date=2000-01-02/event_hour=21/data_0.parquet
2026-02-04 11:28:03 542 duck/event_date=2000-01-02/event_hour=23/data_0.parquet


So for a quick start project that needs to store data overtime, and read it up need it is nice solution. For a heavier project that requires more data with high control over the storage type, parallelism, and timing, it might require creation of proprietary code.



Monday, January 26, 2026

NATS Performance Tuning


 


In this post we can see how to install NATS cluster with configuration tuned for high throughput.


The following is an example script to install a tuned NATS cluster:


#!/usr/bin/env bash
set -e
cd "$(dirname "$0")"

helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update

rm -f nats.yaml

cat <<EOF > nats.yaml
# nats-values.yaml

config:
jetstream:
enabled: false

cluster:
enabled: true
replicas: 4


merge:
# Bigger messages & buffers
max_payload: 8388608 # 8 * 1024 * 1024 bytes
write_deadline: "2s"
max_pending: 536870912 # 512 * 1024 * 1024 bytes
# Connection & subscription scaling
max_connections: 100000
max_subscriptions: 1000000

# Disable anything not needed
debug: false
trace: false
logtime: false


container:
merge:
resources:
limits:
cpu: "4"
memory: "2Gi"
EOF

helm delete nats --ignore-not-found

helm upgrade --install nats \
nats/nats \
--namespace default \
--create-namespace \
-f nats.yaml

rm -f nats.yaml


We are increasing the NATS cluster memory using the max_pending. This configure NATS to buffer messages allowing producer faster publish, and eventuall higher consumption by the consumer.


The producers and consumers actually connect to a random available NATS pod, so in case of small amount of consumers, producers, and NATS pods, we will probably have an unbalanced load for the NATS pods.

To avoid this, use a connection pool on both consumers and producers, hence spreading the load among the NATS pods.


Using messages size 1K, a single NATS pod can handle ~1M messages/seconds. Make sure checking the NATS pod is not saturated by CPU and memory.


Tuesday, January 20, 2026

Kind Load vs. Local Repository


 


In this post we reviewed using kind for local development. One of the requirement for such deployment is to load the image into kind using:

kind load docker-image ${DockerTag}

The kind load CLI sends the entire image to the kind, which means it does not use the docker image layers concept. This means that any image update will take much longer time, especially for large images.


To improve this we can use a local docker repository, and configure kind to load the images from it. To deploy a local repository our development laptop we use:


docker run -d \
--restart=always \
-p 5000:5000 \
--name kind-registry \
-v $HOME/.kind/registry:/var/lib/registry \
registry:2


Then we re-create the kind cluster with configuration to use the local registry:


cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
endpoint = ["http://kind-registry:5000"]
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
EOF


To enable kind to communicate with the docker repository container, we connect the networks:

docker network connect kind kind-registry


And it is ready!


We can now load the build images into the local repository as part of our build script:

LocalRegistryTag=localhost:5000/my-image:latest
docker tag ${DockerTag} ${LocalRegistryTag}
docker push ${LocalRegistryTag}


We also should use an updated helm configuration for the images location, as well as instruct kubernetes to always pull the images from the registry. This is required since we usually do not change the image version upon local build, and simply use `latest`:


image:
registry: localhost:5000/
version: /dev:latest
pullPolicy: Always


The result is a faster build and deployment process on the local laptop.