Once a system is up and running, we usually want to benchmark it. That means we run some stress tests, and check the performance of the system. The performance is measured by various means, for example: transactions per second, CPU usage, and memory usage.
However, in a complex micro-services kubernetes based system, we have many moving parts, and it is hard to pin point the part which is the current bottleneck. This is where prometheus and grafana can assist us.
In this post we will review deployment of prometheus and grafana on a kubernetes cluster.
This includes:
- Prometheus kubernetes permissions
- Prometheus configuration
- Prometheus deployment and service
- Grafana deployment and service
- Grafana predefined dashboards
See also the post Report prometheus metrics from a GO application.
Prometheus is an open source community driven monitoring system and time series database.
To deploy it as part of a kubernetes cluster we need to create a service account with permissions to access kubernetes resources. This includes: service account, cluster role, and cluster role binding.
Prometheus Permissions
Service account:
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-service-account namespace: default
Cluster role:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-role rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"]
Cluster role binding:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-role-binding subjects: - kind: ServiceAccount name: prometheus-service-account namespace: default roleRef: kind: ClusterRole name: prometheus-role apiGroup: rbac.authorization.k8s.io
Prometheus Configuration
The ConfigMap is the following:
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yaml: |- global: scrape_interval: 5s evaluation_interval: 5s scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'kube-state-metrics' static_configs: - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080'] - job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name
Prometheus Deployment & Service
Prometheus deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-deployment spec: replicas: 1 selector: matchLabels: configid: prometheus-container template: metadata: labels: configid: prometheus-container spec: serviceAccountName: prometheus-service-account containers: - name: prometheus image: prom/prometheus:v2.15.2 imagePullPolicy: IfNotPresent args: - "--config.file=/etc/prometheus/prometheus.yaml" - "--storage.tsdb.path=/prometheus/" volumeMounts: - name: prometheus-config mountPath: /etc/prometheus volumes: - name: prometheus-config configMap: defaultMode: 420 name: prometheus-config
Prometheus service:
apiVersion: v1 kind: Service metadata: name: prometheus-service spec: selector: configid: prometheus-container type: NodePort ports: - port: 80 targetPort: 9090 name: http protocol: TCP nodePort: 30030
Once the prometheus is deployed, we access the service to view the collected statistics.
For example, to view CPU usage for all MongoDB containers, we can use the PromQL:
and get the following graph:
First, we create the ConfigMap that configures prometheus as the data source of grafana:
apiVersion: v1 kind: ConfigMap metadata: name: grafana-datasources-config data: datasource-prometheus.yaml: |- apiVersion: 1 datasources: - name: "prometheus" access: "proxy" editable: true orgId: 1 type: "prometheus" url: "http://prometheus-service" version: 1
And then we create a deployment, which uses the ConfigMap:
apiVersion: apps/v1 kind: Deployment metadata: name: grafana-deployment spec: replicas: 1 selector: matchLabels: configid: grafana-container template: metadata: labels: configid: grafana-container spec: containers: - name: grafana image: grafana/grafana:6.5.2-ubuntu imagePullPolicy: IfNotPresent volumeMounts: - name: grafana-datasources-config mountPath: /etc/grafana/provisioning/datasources volumes: - name: grafana-datasources-config configMap: name: grafana-datasources-config
and a service to enable access to the grafana GUI.
The service is expose using NodePort 30040.
apiVersion: v1 kind: Service metadata: name: grafana-service spec: selector: configid: grafana-container type: NodePort ports: - port: 80 targetPort: 3000 name: http protocol: TCP nodePort: 30040
The grafana GUI can be used now to access the prometheus.
Grafana Predefined Dashboards
But wait...
What if we have worked hard, and created a custom dashboard in grafana, and then we want to reinstall the solution on the kubernetes cluster. Our dashboard is lost?
To overcome this, we can use grafana provisioning capability.
Once our custom dashboard is ready, enter to the dashboard GUI in grafana, click on the cog icon on the top of the page, and select the "JSON Model" tab.
This JSON should be saved to a ConfigMap that will be used upon grafana startup to load a "provisioning" dashboard.
Actually we will use 2 ConfigMaps.
The first ConfigMap is the "provisioning dashboard location" instruct grafana to look for custom dashboards in a specific folder, and the second ConfigMap includes our JSON Model.
The "provisioning dashboard location" ConfigMap is:
apiVersion: v1 kind: ConfigMap metadata: name: grafana-dashboards-config data: dashboards-provider.yaml: |- apiVersion: 1 providers: #an unique provider name - name: 'predefined-dashboards' # org id. will default to orgId 1 if not specified orgId: 1 # name of the dashboard folder. Required folder: '' # folder UID. will be automatically generated if not specified folderUid: '' # provider type. Required type: file # disable dashboard deletion disableDeletion: false # enable dashboard editing editable: true # how often Grafana will scan for changed dashboards updateIntervalSeconds: 10 # allow updating provisioned dashboards from the UI allowUiUpdates: false options: # path to dashboard files on disk. Required path: /predefined-dashboards
And the JSON Model ConfigMap is:
apiVersion: v1 kind: ConfigMap metadata: name: grafana-predefined-dashboards-config labels: app.kubernetes.io/instance : bouncer app.kubernetes.io/name : bouncer data: dashboard-bouncer.json: |- { ... The JSON Model text ... NOTICE THE INDENTATION OF THE TEXT }
Last, we need to update the grafana deployment to use these ConfigMaps:
apiVersion: apps/v1 kind: Deployment metadata: name: grafana-deployment spec: replicas: 1 selector: matchLabels: configid: grafana-container template: metadata: labels: configid: grafana-container spec: containers: - name: grafana image: grafana/grafana:6.5.2-ubuntu imagePullPolicy: IfNotPresent volumeMounts: - name: grafana-dashboards-config mountPath: /etc/grafana/provisioning/dashboards - name: grafana-datasources-config mountPath: /etc/grafana/provisioning/datasources - name: grafana-predefined-dashboards-config mountPath: /predefined-dashboards volumes: - name: grafana-predefined-dashboards-config configMap: name: grafana-predefined-dashboards-config - name: grafana-dashboards-config configMap: name: grafana-dashboards-config - name: grafana-datasources-config configMap: name: grafana-datasources-config
We have used prometheus and grafana to assist us in analyze of kubernetes pods performance.
In this post CPU was measured using the "container_cpu_usage_seconds_total" counter, but many other counters are available, such as memory, disk and network. See this article for example.
In this post CPU was measured using the "container_cpu_usage_seconds_total" counter, but many other counters are available, such as memory, disk and network. See this article for example.
No comments:
Post a Comment