Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Monday, March 16, 2026

Deploy ClickHouse Cluster on Kubernetes without Operator


 


In a recent post we've seen how to deploy ClickHouse standalone on kubernetes without operator. In this post we will further show deployment on kubernetes without operator, but this time we will use clickhouse cluster.

To install a cluster, we need to deploy 2 deployments: clickhouse keeper that is used to maintain a quorum, and the clickhouse server which includes the actual cluster nodes. We will use 3 replicas for the keeper, and 2 replicas for the server.


ClickHouse Keeper

The keeper includes the entities below.


The service

apiVersion: v1
kind: Service
metadata:
name: clickhousekeeper-service
spec:
selector:
configid: clickhousekeeper-container
clusterIP: None
ports:
- name: client
port: 9181
- name: raft
port: 9444


The config

apiVersion: v1
kind: ConfigMap
metadata:
name: clickhousekeeper-config
data:
keeper_config.xml: |-
<clickhouse>

<logger>
<level>information</level>
<console>true</console>
</logger>

<listen_host>0.0.0.0</listen_host>

<keeper_server>
<tcp_port>9181</tcp_port>

<server_id>REPLACE_ME_SERVER_ID</server_id>

<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

<raft_configuration>
<server>
<id>1</id>
<hostname>clickhousekeeper-statefulset-0.clickhousekeeper-service.default.svc.cluster.local</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>clickhousekeeper-statefulset-1.clickhousekeeper-service.default.svc.cluster.local</hostname>
<port>9444</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>


The statefuleset

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clickhousekeeper-statefulset
spec:
serviceName: clickhousekeeper-service
podManagementPolicy: Parallel
replicas: 3
selector:
matchLabels:
configid: clickhousekeeper-container
template:
metadata:
labels:
configid: clickhousekeeper-container
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: configid
operator: In
values:
- clickhousekeeper-container

initContainers:
- name: config
image: clickhouse/clickhouse-keeper:latest
imagePullPolicy: Always
command:
- bash
- -c
- |
set -e
POD_ORDINAL=$(echo $HOSTNAME | awk -F'-' '{print $NF}')
cp /config-origin/keeper_config.xml /etc/clickhouse-keeper/keeper_config.xml
sed -i "s/REPLACE_ME_SERVER_ID/$((POD_ORDINAL+1))/" /etc/clickhouse-keeper/keeper_config.xml
cat /etc/clickhouse-keeper/keeper_config.xml
volumeMounts:
- name: clickhousekeeper-config
mountPath: /config-origin
- name: clickhousekeeper-runtime-config
mountPath: /etc/clickhouse-keeper
containers:
- name: clickhousekeeper
image: clickhouse/clickhouse-keeper:latest
imagePullPolicy: Always
ports:
- containerPort: 9181
name: client
- containerPort: 9444
name: raft
volumeMounts:
- name: pvc
mountPath: /var/lib/clickhouse
- name: clickhousekeeper-runtime-config
mountPath: /etc/clickhouse-keeper

volumes:
- name: clickhousekeeper-config
configMap:
name: clickhousekeeper-config
- name: clickhousekeeper-runtime-config
emptyDir: {}

volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp2"
resources:
requests:
storage: 10Gi


Notice the config map is saved to a side folder, and updated with the related ID by the pod name.


ClickHouse Server

The server includes the entities below.


The service

apiVersion: v1
kind: Service
metadata:
name: clickhouseserver-service
spec:
selector:
configid: clickhouseserver-container
clusterIP: None
ports:
- name: tcp
port: 9000
- name: http
port: 8123


The config

apiVersion: v1
kind: ConfigMap
metadata:
name: clickhouseserver-config
data:
custom.xml: |-
<clickhouse>
<logger>
<level>information</level>
<console>true</console>
</logger>

<listen_host>0.0.0.0</listen_host>

<keeper_server>
<tcp_port>9181</tcp_port>
</keeper_server>

<zookeeper>
<node>
<host>clickhousekeeper-statefulset-0.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
<node>
<host>clickhousekeeper-statefulset-1.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
<node>
<host>clickhousekeeper-statefulset-2.clickhousekeeper-service.default.svc.cluster.local</host>
<port>9181</port>
</node>
</zookeeper>

<remote_servers>
<llmp_clickhouse_cluster>
<shard>
<replica>
<host>clickhouseserver-statefulset-0.clickhouseserver-service.default.svc.cluster.local</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouseserver-statefulset-1.clickhouseserver-service.default.svc.cluster.local</host>
<port>9000</port>
</replica>
</shard>
</llmp_clickhouse_cluster>
</remote_servers>
</clickhouse>
users.xml: |-
<clickhouse>
<users>
<default>
<password></password>
<networks>
<ip>::/0</ip>
</networks>
</default>
</users>
</clickhouse>
macros.xml: |-
<clickhouse>
<macros>
</macros>
</clickhouse>


The statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clickhouseserver-statefulset
spec:
serviceName: clickhouseserver-service
podManagementPolicy: Parallel
replicas: 2
selector:
matchLabels:
configid: clickhouseserver-container
template:
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: configid
operator: In
values:
- clickhouseserver-container

initContainers:
- name: config
image: lclickhouse/clickhouse-server:latest
imagePullPolicy: Always
command:
- bash
- -c
- |
set -e
cp /config-origin/custom.xml /etc/clickhouse-server/config.d/custom.xml
cp /config-origin/macros.xml /etc/clickhouse-server/config.d/macros.xml
sed -i "s/REPLACE_ME_HOST_NAME/${HOSTNAME}/" /etc/clickhouse-server/config.d/macros.xml
volumeMounts:
- name: clickhouseserver-config
mountPath: /config-origin
- name: clickhouseserver-runtime-config
mountPath: /etc/clickhouse-server/config.d
containers:
- name: clickhouseserver
image: clickhouse/clickhouse-server:latest
imagePullPolicy: Always
ports:
- containerPort: 9181
name: client
- containerPort: 9444
name: raft
volumeMounts:
- name: pvc
mountPath: /var/lib/clickhouse
- name: clickhouseserver-runtime-config
mountPath: /etc/clickhouse-server/config.d
- name: clickhouseserver-config
mountPath: /etc/clickhouse-server/users.d/users.xml
subPath: users.xml

volumes:
- name: clickhouseserver-config
configMap:
name: clickhouseserver-config
- name: clickhouseserver-runtime-config
emptyDir: {}

volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp2"
resources:
requests:
storage: 10Gi



Here again we update the configuration using an init container.


Final Note

Using clickhouse operator is much simpler than the manual mode, but in some cases specific adaptations are required that block us from using the operator. In this post we've displayed the full requirements for a minimal manual deployment of a clickhouse cluster.



Monday, March 2, 2026

Analyzing JSON and Text using Bash




In this post we will review simple methods to quickly analyze JSON and text data using bash CLIs and pipelines. While these methods might lack some of the abilities that we can find in more complex GO/JavaScript/python code, they have a great advantage of quickly grepping a piece of information we're looking for.

For this post we will use an input file containing list of HTTP requests, for example:












JSON Parsing

This first thing we want to do it to use the `jq` command to get this as a parsed JSON.

cat data.json | jq











Next, lets get only the source IPs from the file:

cat data.json | jq  '.[].SourceIp'








Count and Sort

Let's get unique count of requests per IP sorted by count.


cat data.json | jq '.[].SourceIp' | sort | uniq -c | sort -n




























Text Parsing

Let's get the methods usage in the HTTP request.

cat data.json | jq '.[].HttpRequest' 

















Now split the text by the first space and get only the first part, then remove the first character (the quotes). Then use the previous method to count per method.

cat data.json | jq '.[].HttpRequest' | awk -F' ' '{print $1}' | cut -c2-1000 | sort | uniq -c | sort -n







We can do the same to get only the HTTP path.

cat data.json | jq '.[].HttpRequest' | awk -F' ' '{print $2}' | sort | uniq -c | sort -n












Grep Area

What if we want to get transactions only for a specific IP?

We use the grep -A, -B, -C flags.

-A = get also one line after the located text
-B = get also one line before the located text
-C = get also one line before and after the located text


cat data.json | jq  | grep -B1 46.117.105.66

















Now we analyze the transactions for this IP like we've done before:

cat data.json | jq  | grep -B1 46.117.105.66 | grep HttpRequest | awk -F ' ' '{print $3}' | sort | uniq -c | sort -n










Final Note


In this post we're had a taste of the power of bash piping.
I highly recommend experimenting with these CLIs since you can do in a few seconds things that would otherwise take you much longer.