Once a system is up and running, we usually want to benchmark it. That means we run some stress tests, and check the performance of the system. The performance is measured by various means, for example: transactions per second, CPU usage, and memory usage.
However, in a complex micro-services kubernetes based system, we have many moving parts, and it is hard to pin point the part which is the current bottleneck. This is where
prometheus and
grafana can assist us.
In this post we will review deployment of prometheus and grafana on a kubernetes cluster.
This includes:
- Prometheus kubernetes permissions
- Prometheus configuration
- Prometheus deployment and service
- Grafana deployment and service
- Grafana predefined dashboards
Prometheus
Prometheus is an open source community driven monitoring system and time series database.
To deploy it as part of a kubernetes cluster we need to create a service account with permissions to access kubernetes resources. This includes: service account, cluster role, and cluster role binding.
Prometheus Permissions
Service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-service-account
namespace: default
Cluster role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-role
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
Cluster role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-role-binding
subjects:
- kind: ServiceAccount
name: prometheus-service-account
namespace: default
roleRef:
kind: ClusterRole
name: prometheus-role
apiGroup: rbac.authorization.k8s.io
Prometheus Configuration
The prometheus configuration is saved in a kubernetes ConfigMap. It includes various options, such as the scraping interval (sample the targets each X seconds), and the
cAdvisor job which runs as part of the kubelet binary.
The ConfigMap is the following:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yaml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
Prometheus Deployment & Service
Next, we need to deploy the prometheus deployment, and the service, allowing access to the deployment. Notice that we are not using persistent storage, since the data is usually used only during the benchmark testing. In case of need, change the deployment to use persistent storage. Notice that the prometheus service is exposed as NodePort, allowing easy access to the service on port 30030.
Prometheus deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
spec:
replicas: 1
selector:
matchLabels:
configid: prometheus-container
template:
metadata:
labels:
configid: prometheus-container
spec:
serviceAccountName: prometheus-service-account
containers:
- name: prometheus
image: prom/prometheus:v2.15.2
imagePullPolicy: IfNotPresent
args:
- "--config.file=/etc/prometheus/prometheus.yaml"
- "--storage.tsdb.path=/prometheus/"
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus
volumes:
- name: prometheus-config
configMap:
defaultMode: 420
name: prometheus-config
Prometheus service:
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
spec:
selector:
configid: prometheus-container
type: NodePort
ports:
- port: 80
targetPort: 9090
name: http
protocol: TCP
nodePort: 30030
Once the prometheus is deployed, we access the service to view the collected statistics.
For example, to view CPU usage for all MongoDB containers, we can use the
PromQL:
rate(container_cpu_usage_seconds_total{pod=~".*mongo.*"}[1m])
and get the following graph:
Grafana
Now that we have the statistics in prometheus, we can use grafana as the GUI to view and analyze the data.
First, we create the ConfigMap that configures prometheus as the data source of grafana:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources-config
data:
datasource-prometheus.yaml: |-
apiVersion: 1
datasources:
- name: "prometheus"
access: "proxy"
editable: true
orgId: 1
type: "prometheus"
url: "http://prometheus-service"
version: 1
And then we create a deployment, which uses the ConfigMap:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-deployment
spec:
replicas: 1
selector:
matchLabels:
configid: grafana-container
template:
metadata:
labels:
configid: grafana-container
spec:
containers:
- name: grafana
image: grafana/grafana:6.5.2-ubuntu
imagePullPolicy: IfNotPresent
volumeMounts:
- name: grafana-datasources-config
mountPath: /etc/grafana/provisioning/datasources
volumes:
- name: grafana-datasources-config
configMap:
name: grafana-datasources-config
and a service to enable access to the grafana GUI.
The service is expose using NodePort 30040.
apiVersion: v1
kind: Service
metadata:
name: grafana-service
spec:
selector:
configid: grafana-container
type: NodePort
ports:
- port: 80
targetPort: 3000
name: http
protocol: TCP
nodePort: 30040
The grafana GUI can be used now to access the prometheus.
Grafana Predefined Dashboards
But wait...
What if we have worked hard, and created a custom dashboard in grafana, and then we want to reinstall the solution on the kubernetes cluster. Our dashboard is lost?
To overcome this, we can use grafana provisioning capability.
Once our custom dashboard is ready, enter to the dashboard GUI in grafana, click on the cog icon on the top of the page, and select the "JSON Model" tab.
This JSON should be saved to a ConfigMap that will be used upon grafana startup to load a "
provisioning" dashboard.
Actually we will use 2 ConfigMaps.
The first ConfigMap is the "provisioning dashboard location" instruct grafana to look for custom dashboards in a specific folder, and the second ConfigMap includes our JSON Model.
The "provisioning dashboard location" ConfigMap is:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-config
data:
dashboards-provider.yaml: |-
apiVersion: 1
providers:
# an unique provider name
- name: 'predefined-dashboards'
# org id. will default to orgId 1 if not specified
orgId: 1
# name of the dashboard folder. Required
folder: ''
# folder UID. will be automatically generated if not specified
folderUid: ''
# provider type. Required
type: file
# disable dashboard deletion
disableDeletion: false
# enable dashboard editing
editable: true
# how often Grafana will scan for changed dashboards
updateIntervalSeconds: 10
# allow updating provisioned dashboards from the UI
allowUiUpdates: false
options:
# path to dashboard files on disk. Required
path: /predefined-dashboards
And the JSON Model ConfigMap is:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-predefined-dashboards-config
labels:
app.kubernetes.io/instance : bouncer
app.kubernetes.io/name : bouncer
data:
dashboard-bouncer.json: |-
{
... The JSON Model text
... NOTICE THE INDENTATION OF THE TEXT
}
Last, we need to update the grafana deployment to use these ConfigMaps:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-deployment
spec:
replicas: 1
selector:
matchLabels:
configid: grafana-container
template:
metadata:
labels:
configid: grafana-container
spec:
containers:
- name: grafana
image: grafana/grafana:6.5.2-ubuntu
imagePullPolicy: IfNotPresent
volumeMounts:
- name: grafana-dashboards-config
mountPath: /etc/grafana/provisioning/dashboards
- name: grafana-datasources-config
mountPath: /etc/grafana/provisioning/datasources
- name: grafana-predefined-dashboards-config
mountPath: /predefined-dashboards
volumes:
- name: grafana-predefined-dashboards-config
configMap:
name: grafana-predefined-dashboards-config
- name: grafana-dashboards-config
configMap:
name: grafana-dashboards-config
- name: grafana-datasources-config
configMap:
name: grafana-datasources-config
Summary
We have used prometheus and grafana to assist us in analyze of kubernetes pods performance.
In this post CPU was measured using the "container_cpu_usage_seconds_total" counter, but many other counters are available, such as memory, disk and network. See
this article for example.