Placeholder image

Sandor Guba

Mon, Feb 26, 2018


Application monitoring with Prometheus and Pipeline

Monitoring series:
Monitoring Apache Spark with Prometheus
Monitoring multiple federated clusters with Prometheus - the secure way
Application monitoring with Prometheus and Pipeline
Building a cloud cost management system on top of Prometheus
Monitoring Spark with Prometheus, reloaded

At Banzai Cloud we provision and monitor large Kubernetes clusters deployed to multiple cloud/hybrid environments. The clusters and the applications or frameworks are all managed by our next generation PaaS, Pipeline. In the previous post we were discussing about how we use Prometheus in a federated architecture to collect data from multiple clusters in a secure way. As we add more and more providers and applications to Pipeline we are facing a few challenges to handle monitoring data in a standardized way.

Layers of monitoring

Very often in order to find the root cause of a problem it is not enough to monitor only the application itself. Errors from lower levels can escalate and filter into the application and cause undesired behavior. That’s why it is important to monitor every layer of your deployment.

Layers of monitoring

Node layer

From bottom up, this is the first layer we do monitor. There are a few trivial mistakes which can be avoided, as such:

  • No more space left on the device
  • Out Of Memory
  • Can’t reach the destination

You should have at least CPU, Memory, Disk and Network metrics and associated alerts. These metrics can save many hours of unnecessary debugging. Fortunately these metrics are the easiest to collect via Prometheus node exporter or Kubernetes cAdvisor.

prometheus-node

Platform / Kubernetes layer

As Kubernetes manages the whole cluster we should have a clear vision whats happening inside. Like in the previous paragraph this problem has some best practices as well. The kube-state-exporter provides good overview of the cluster’s state. Many useful metrics like running pods, state of the pods, scheduling queue etc. come from here.

Application layer - welcome relabels

The application layer is where the anarchy begins. First it depends on the individual developer how they name their metrics. Following whatever naming conventions is good if you only want to monitor your own applications but become troublesome when you try to monitor applications from different sources and correlate between metrics - which we are exactly doing with Pipeline.

A very good example is the JVM based Apache Spark metrics. These kind of applications usually lack advanced monitoring capabilities, nor were forward thinking with monitoring in mind and there is no unified interface for metric names. Unluckily it is more than challenging to accomplish anything in huge projects with many contributors.

To make things even more complicated Prometheus monitoring is based on a pull model which is not suitable for batch job monitoring. To fill this gap (and reverse), Prometheus can be extended with a pushgateway. It accepts metrics over HTTP and provides scrapingendpoints for the Prometheus server.

Deep dive example - monitoring Apache Spark and Apache Zeppelin

Okay, we have a plan let’s see what happens. We set up a Spark job and push metrics to the pushgateway using Banzai Cloud’s open source contribution).

pushgateway

# HELP spark_79ef988408c14743af0bccd2d404e791_1_executor_filesystem_file_read_bytes Generated from Dropwizard metric import (metric=spark-79ef988408c14743af0bccd2d404e791.1.executor.filesystem.file.read_bytes, type=org.apache.spark.executor.ExecutorSource$$anon$1)
# TYPE spark_79ef988408c14743af0bccd2d404e791_1_executor_filesystem_file_read_bytes gauge
spark_79ef988408c14743af0bccd2d404e791_1_executor_filesystem_file_read_bytes{instance="10.40.0.3",job="spark-79ef988408c14743af0bccd2d404e791",number="1",role="executor"} 0 1519224695487

Great, we have now metrics but each row has some very unique part that does not fit into the Prometheus concept which requires same metric names with different labels. Moreover an Apache Zeppelin launched Spark job looks a little bit different.

Zeppelin raw metrics coming from the Spark workers

zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize
The structure of Spark/Zeppelin metrics
<spark_app_name>_<interpreter_group_id>__<notebook_id>_<executor_number>_<metric_name>
  • spark_app_name is a configurable variable to identify your application
  • interpreter_group_id (one interpreter/notebook or per notebook)
  • notebook_id this is our extended information about the Zeppelin notebook
  • executor_number unique number of the executor
  • metric_name this is the metric identifier

The goal is to have general metric names with customizable labels (the Prometheus way!)

CodeGenerator_generatedMethodSize{spark_app_name="zri",interpreter_group_id="2cxx5tw2h",notebook_id="2d6zutx3z",number="1"}
Small tutorial of Prometheus relabels

prometheus-regexp

Relabel is a very powerful function embedded in Prometheus. You can completely rewrite labels (yes the metric name is a label as well) after scraping it. For detailed information visit the Prometheus documentation at prometheus.io.

How Prometheus label replacement works?

This is an example metric_relabel section from a Prometheus configuration.

source_labels: [role,__name__]
regex: '[driver|executor]+;(.*)'
replacement: '$1'
target_label: __oldname__

Source_labels

In this section you specify the labels for those you’d like to apply a regexp expression. These labels will be joined by the “;” separator.

 source_labels: [role,__name__]

The value handled by the regexp will look like this

driver;zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize_count
Regexp

The applied regexp expression. This example allows driver or executor roles and group match the whole name.

'[driver|executor]+;(.*)'
Replacement

Regexp template to replace value. For example the first group match:

'$1'
Target_label

The target label to create/overwrite with value.

__oldname__

Use relabel to transform metrics

First we need to catch only the application metrics. Luckily the Spark metrics always come with a role label. Based on it’s value we can push those metrics into a temporary label called __oldname__.

- source_labels: [role,__name__]
  regex: '[driver|executor]+;(.*)'
  replacement: '$1'
  target_label: __oldname__

After that step we can distinguish different type of metrics with some more relabel configuration. In the following example we add spark_app_name label based on our temporary __oldname__ label.

- source_labels: [__oldname__]
  regex: '(?P<spark_app_name>\w[^_]+)_(?P<interpreter_group_id>\w[^_]+)__(?P<notebook_id>\w[^_]+)_(?P<role>[driver|0-9]+)((\w[^_]+)_(\w[^_]+)__(\w[^_]+))?_(?P<metric_name>.*)'
  replacement: '${spark_app_name}'
  target_label: spark_app_name

Cleanup and finish. We change the metric original name __name__ to the parsed one we get from the regexp and drop the temporary __oldname__ label.

- source_labels: [__oldname__]
  regex: '(?P<app_name>\w+)_(?P<role>[driver|0-9]+)((\w[^_]+)_(\w[^_]+)__(\w[^_]+))?_(?P<metric_name>.*)'
  replacement: '${metric_name}'
  target_label: __name__

- source_labels: [__oldname__]
  replacement: ''
  target_label: __oldname__

Inspired by https://www.robustperception.io/extracting-labels-from-legacy-metric-names/

Results on a Grafana dashboard

To visualize the metrics we configured our favorite dashboarding tool Grafana.

prometheus-spark

More & More metrics

After some fine tuning we managed to unify plenty of different Apache Spark metrics. You can check our final open source configuration in the Banzai Cloud GitHub repository

Zeppelin notebook example

# HELP zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=zri-2cxx5tw2h--2d6zutx3z.1.CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize summary
zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize{instance="10.40.0.2",job="zri-2cxx5tw2h--2d6zutx3z",number="1",role="executor",quantile="0.5"} 0 1519301799656
zri_2cxx5tw2h__2d6zutx3z_1_CodeGenerator_generatedMethodSize{instance="10.40.0.2",job="zri-2cxx5tw2h--2d6zutx3z",number="1",role="executor",quantile="0.75"} 0 1519301799656

Spark submit example

# HELP spark_79ef988408c14743af0bccd2d404e791_1_CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=spark-79ef988408c14743af0bccd2d404e791.1.CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE spark_79ef988408c14743af0bccd2d404e791_1_CodeGenerator_generatedMethodSize summary
spark_79ef988408c14743af0bccd2d404e791_1_CodeGenerator_generatedMethodSize{instance="10.40.0.3",job="spark-79ef988408c14743af0bccd2d404e791",number="1",role="executor",quantile="0.5"} 10 1519224695487
spark_79ef988408c14743af0bccd2d404e791_1_CodeGenerator_generatedMethodSize{instance="10.40.0.3",job="spark-79ef988408c14743af0bccd2d404e791",number="1",role="executor",quantile="0.75"} 79 1519224695487

Zeppelin streaming example

# HELP zri_2cxx5tw2h__2d6zutx3z_driver_zri_2cxx5tw2h__2d6zutx3z_StreamingMetrics_streaming_lastCompletedBatch_processingDelay Generated from Dropwizard metric import (metric=zri-2cxx5tw2h--2d6zutx3z.driver.zri-2cxx5tw2h--2d6zutx3z.StreamingMetrics.streaming.lastCompletedBatch_processingDelay, type=org.apache.spark.streaming.StreamingSource$$anon$1)
# TYPE zri_2cxx5tw2h__2d6zutx3z_driver_zri_2cxx5tw2h__2d6zutx3z_StreamingMetrics_streaming_lastCompletedBatch_processingDelay gauge
zri_2cxx5tw2h__2d6zutx3z_driver_zri_2cxx5tw2h__2d6zutx3z_StreamingMetrics_streaming_lastCompletedBatch_processingDelay{instance="10.46.0.5",job="zri-2cxx5tw2h--2d6zutx3z",role="driver"} 6 1519301807816

Takeaway tips and tricks

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus