Placeholder image

Gabor Kozma

Wed, Feb 7, 2018

Monitoring multiple federated clusters with Prometheus - the secure way

At Banzai Cloud we run multiple Kubernetes clusters deployed with our next generation PaaS, Pipeline and we deploy these clusters across different cloud providers like AWS, Azure, Google or on-prem. These clusters are usually launched using the same control plane deployed either to AWS as a CloudFormation template or Azure as an ARM template and they are running inside a Kubernetes cluster as well (we eat our own dog food).

One of the added values of deployments using Pipeline is the out of the box monitoring and dashboards - by having default spotguides for the applications we support of the box. For enterprise grade monitoring we chose Prometheus and Grafana, both open source, widely popular and with a large community.

We love Prometheus so much that we have built and open source a cloud cost management system on top, called Hollowtrees.

We also contributed an Apache Spark sink to monitor your Spark applications the cloud native way

Since we use large and multi-cloud clusters and deployments we use federated Prometheus clusters.

Prometheus federation

Prometheus is a very flexible monitoring solution where each Prometheus server is able to act as a target to another Prometheus server in a HA and secure way. By configuring and using federation, it allows a Prometheus server to scrape selected time series data from another Prometheus server. There are two types of federation scenarios supported by Prometheus - at Banzai Cloud we use both hierarchical and cross-service federation but the examples below (from the Pipeline control plane) are showcasing the hierarchical one.

Federated Prometheus

A typical Prometheus federation example configuration looks like this:

- job_name: 'federate'
  scrape_interval: 15s

  honor_labels: true
  metrics_path: '/federate'

      - '{job="prometheus"}'
      - '{__name__=~"job:.*"}'

    - targets:
      - 'source-prometheus-1:9090'
      - 'source-prometheus-2:9090'
      - 'source-prometheus-3:9090'

As you might know, in Prometheus a job is using the same authentication. The problem with this is that we monitor multiple federated clusters, across multiple cloud providers and using the same authentication per cluster or job is not feasible. Thus in our case this is dynamically generated by Pipeline for each cluster to be monitored and the end result looks like this:

- job_name: sfpdcluster14
  honor_labels: true
    - '{job="kubernetes-nodes"}'
    - '{job="kubernetes-apiservers"}'
    - '{job="kubernetes-service-endpoints"}'
    - '{job="kubernetes-cadvisor"}'
    - '{job="node_exporter"}'
  scrape_interval: 15s
  scrape_timeout: 7s
  metrics_path: /api/v1/namespaces/default/services/monitor-prometheus-server:80/proxy/prometheus/federate
  scheme: https
  - targets:
      cluster_name: sfpdcluster14
    ca_file: /opt/pipeline/statestore/sfpdcluster14/certificate-authority-data.pem
    cert_file: /opt/pipeline/statestore/sfpdcluster14/client-certificate-data.pem
    key_file: /opt/pipeline/statestore/sfpdcluster14/client-key-data.pem
    insecure_skip_verify: true

Prometheus and Kubernetes (the secure way)

As you see above the remote Kubernetes cluster is accessed through the standard Kubernetes API server, instead of adding an ingress controller to each and every remote cluster to be monitored. We chose this way as in this case we can use the standard Kubernetes authentication and authorization mechanisms - as Prometheus supports TLS based authentication. As seen in metrics_path: /api/v1/namespaces/default/services/monitor-prometheus-server:80/proxy/prometheus/federate snippet, this is a standard Kubernetes API endpoint, suffixed with a service name and uri: monitor-prometheus-server:80/proxy/prometheus/federate. The main Prometheus server at top of the topology uses this endpoint to scrape the federated clusters and the default Kubernetes proxy handles and dispatches the scrapes towards the service.

This config below is the authentication part of the generated setup and the TLS configuration is explained in the documentation

    ca_file: /opt/pipeline/statestore/sfpdcluster14/certificate-authority-data.pem
    cert_file: /opt/pipeline/statestore/sfpdcluster14/client-certificate-data.pem
    key_file: /opt/pipeline/statestore/sfpdcluster14/client-key-data.pem
    insecure_skip_verify: true

Again, all these are dynamically generated by Pipeline.

Monitoring a Kubernetes service

Monitoring systems needs some form of service discovery to work. Prometheus supports different service discovery scenarios - a top-down approach with Kubernetes as source or a bottom-up one with a source like Consul. In our case since all our deployments are Kubernetes based we use the first approach.

Lets take an example - a pushgateway Kubernetes service definition. Prometheus, through annotations will be scraping this service "true" and it searches for the pushgateway name as the probe.

apiVersion: v1
kind: Service
  annotations: pushgateway "true"

    app: {{ template "" . }}
    chart: {{ .Chart.Name }}-{{ .Chart.Version }}
    heritage: {{ .Release.Service }}
    release: {{ .Release.Name }}
  name: prometheus-pushgateway
    - name: http
    app: prometheus
    component: "pushgateway"
    release: {{ .Release.Name }}
  type: "ClusterIP"

The Prometheus config block below uses the internal Kubernetes service discovery kubernetes_sd_configs. Because this is running in-cluster and we have provided an appropriate cluster role to the deployment, there is no need to explicitly specify authentication, though we could. After the service discovery we retain the list of services where the probe name is pushgateway and scrape is true.

Prometheus can use service discovery out of box running inside Kubernetes

- job_name: 'banzaicloud-pushgateway'
      honor_labels: true

        - role: service

        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
          action: keep
          regex: "pushgateway"
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - source_labels: [__name__]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: ${1}:${2}
          target_label: __address__

As you can see here the annotations are not hardcoded. They are configured inside the Prometheus relabel configuration section. For example the following configuration grabs the Kubernetes service metadata annotations and replace the __metrics_path__ label with that.

 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)

We will expend more on relabels in the following post through a practical example of monitoring Spark and Zeppelin, and unifying metrics names (metrics_name) for a centralized dashboard.


There are lots of dashboarding solution available but we chose to use use Grafana. Grafana has a great integration with Prometheus and other time series databases as well ands provides access to useful tools like the PromQL editor to create amazing dashboards. Just to remember - “Prometheus provides a functional expression language that lets the user select and aggregate time series data in real time“. PromQL adds some basic statistical functions as well which we are using - like linear prediction functions that helps us on alerting before unexpected things happens.

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:



comments powered by Disqus