Placeholder image

Márk Sági-Kazár

Thu, Oct 11, 2018


Instrumenting Gin and Gorm with OpenCensus

Observability is a key feature of the Banzai Cloud Pipeline platform - we place significant efforts into monitoring and centralized log collection - and recently tracing.

Instrumentation

Instrumentation is the process of monitoring/measuring software performance, writting logs and trace information and basically every effort to collect information that helps diagnosing errors, running the software with maximum availability, understanding how the software operates and more importantly, how it impacts users and the business.

The lack of proper instrumentation can easily result in frustrated customers, frustrated managers, frustrated developers and lost engineering hours, which all cost money, so choosing the right tools for the job is vital.

OpenCensus

OpenCensus is a relatively new player on the software instrumentation market. It consists of a set of vendor-neutral libraries allowing developers to decouple their code from the actual vendor being used. Probably this is the reason why it became so popular and a member of the Cloud Native Computing Foundation so quickly.

The core features of OpenCensus are collecting metrics and traces from an application and exposing them to the storage/analysis tool of your choice. All major players (Prometheus, Jaeger, Zipkin) are supported, so chances are you can easily integrate it into an existing workflow as well. It also provides a vendor-neutral collector agent, allowing to decouple the application from those tools entirely.

It worths mentioning that OpenCensus itself is language-agnostic, so it’s a great choice in polyglot environment as well.

Metrics

Collecting metrics is an important way of measuring software performance. Metrics can serve as a base for automatic actions (like scaling up an application or generating alerts), but they are also often the first step in diagnosing errors (for example high error rates are first visible on metric dashboards).

OpenCensus metrics (called stats) consist of two parts:

  • Measuring data points
  • Aggregation of those data points in form of views

Measures represent the type of metric to be recorded. All measures have a name uniquely identifing them, and a unit which defines the type of a single data point. A measure for HTTP request latency would look something like this:

1	RequestLatency = stats.Float64(
2		"http/latency",
3		"End-to-end latency",
4		stats.UnitMilliseconds, 
5	)

Measures themselves don’t really represent anything on their own. Sticking to the previous example: knowing a bunch of latency values does not tell whether there is a problem, we need to make a connection between the data points by aggregating them. This is where views come to play.

The following aggregation methods are supported:

  • Count: The count of the number of measurement points.
  • Distribution: Histogram distribution of the measurement points.
  • Sum: A sum up of the measurement points.
  • LastValue: Keeps the last recorded value, drops everything else.

Additionally, views can be broken down by user-defined key-value pairs (recorded with measures), called tags. Tags can provide further insight into application events and show correlation with performance drops. For example, tagging latency measures with HTTP method or even URL path can show which part of the application is slow.

Bellow is an example for a distribution aggregated, latency view:

1	ServerLatencyView = &view.View{
2		Name:        "http/latency",
3		Description: "Latency distribution of HTTP requests",
4		TagKeys:     []tag.Key{Method},
5		Measure:     ServerLatency,
6		Aggregation: DefaultLatencyDistribution,
7	}

This concept might seem complicated compared to Prometheus (the current ruler of metrics collection), where metric types include the aggregation method as well, but it provides greater flexibility by separating concerns and frees you from vendor lock-in.

Tracing

Tracing tracks the lifecycle of a single request as it travels through the components of the system. Components in this context can be separate services, database and caching systems, basically everything that plays a part in responding to the user request.

The unit of work is called a span which contains information (latency, status, user defined events and attributes, etc) about the work the component was doing. Each span is collected into a single trace, allowing us to get a full picture of what happened during the request.

This is an extremely powerful tool that can be used to quickly diagnose errors and narrow down the area of investigation as much as possible. In complex systems this can be incredibly helpful, because errors usually appear in user facing services first and it’s curcial to be able to find the root cause as soon as possible. By going through the logs of every component takes time, but if we can point at the misbehaving one we can take that time from hours down to minutes.

For example, in the following case the user only sees a 500 error returned from the service. Going through the logs would take time, but by simply looking at the trace we can immediatly point at the database:

Tracing

Context propagation

An important concept in Go which OpenCensus heavily relies on is context propagation. As of Go 1.7 the standard library comes with a new package, called context. It provides tools for transfering parameters across API boundaries. These parameters can include cancellation signals, deadlines, but arbitrary user parameters (like trace information) as well.

OpenCensus uses contexts to propagate trace and metrics information within a single process between components, so it is important maintain a coherent context during a request lifecycle. Most of the tools in the standard library (http, sql) and frameworks like gRPC and Twirp already support (or even heavily build on) context propagation, so you are covered from that point of view.

The problem with this concept is you essentially have to make contexts part of your API as well to propagate them from one boundary to another.

Here is a simple example saving some data in a database received from a REST API call:

 1package main
 2
 3import (
 4	"context"
 5	"database/sql"
 6	"encoding/json"
 7	"net/http"
 8)
 9
10var db *sql.DB // initialize DB
11
12type Person struct {
13	Name string `json:"name"`
14}
15
16func CreateHandler(w http.ResponseWriter, r *http.Request) {
17	decoder := json.NewDecoder(r.Body)
18
19	var Person person
20
21	decoder.Decode(&person)
22
23	CreatePerson(r.Context(), person)
24}
25
26func CreatePerson(ctx context.Context, person Person) {
27	db.ExecContext(ctx, "INSERT INTO people (name) VALUES (?)", person.Name)
28}

As you can see, the context is passed all the way down to the SQL library through the custom application API.

It may not be ideal or comfortable, it’s easy to forget about it, does not produce nice APIs, but it’s an acceptable compromise given the advantages that come with it.

Example application

All right, let’s get to the code. For brevity, only the key parts will be presented in this post, but the whole code can be found in this repository. The application can easily be started locally (details are in the readme) and it contains everything you need for the demonstratation:

  • MySQL as the database backend
  • Prometheus for metrics collection
  • Jaeger for trace collection

To see how the example code evolved, take a look at the closed PRs.

Chances are some of the code in the repository will be extracted to a maintained library once it becomes stable.

Instrumenting Gin

Gin in it’s core is nothing more than an HTTP handler, internally it uses the HTTP server from the standard library. Fortunately the OpenCensus library already contains integration for HTTP handlers, which works with Gin perfectly.

All you need to do is wrap the gin.Engine instance with the tracing HTTP handler and pass it to the HTTP server (This unfortunately means that you won’t be able to use the Run function of Gin anymore, but that’s a small price):

 1r := gin.Default()
 2
 3// Add routes to gin
 4
 5http.ListenAndServe(
 6	"127.0.0.1:8080",
 7	&ochttp.Handler{
 8		Handler: r,
 9	},
10)

After adding something like this to your Gin application, you should see something similar in the choice of your tracing tool:

Jaeger Gin request

There is one catch though: the current implementation of the HTTP plugin does not allow inserting custom tags into the recorded metrics. Normally this may not be a problem, but when using dynamic routes with parameters it would be nice to collect these routes instead of actual URL paths. Currently this is not possible, but there is a pending issue describing this problem.

But even without route names in metrics and traces, instrumenting Gin with OpenCensus is pretty easy and works well.

Instrumenting Gorm

Instrumenting Gorm is a little bit harder, because it does not support any kind of context propagation which is necessary for OpenCensus to work properly. There are several issues requesting the feature, but nothing happened so far, so we have to find a workaround.

Luckily Gorm allows us to register callbacks for certain events (namely: query, create, update, delete) and set arbitrary values in it’s scope. (Gorm manages an internal state for every action and to avoid collision of this state, these actions are executed in separate scopes)

These two features allow us to implement a set of callbacks that can instrument the database operations. Needless to say this is less than ideal because the context is not propagated to the actual SQL implementation, but it seems to be working really well.

Integration in this case is also quite easy:

 1import "github.com/sagikazarmark/go-gin-gorm-opencensus/pkg/ocgorm"
 2
 3// ...
 4
 5db, err := gorm.Open("dialect", "dsn")
 6if err != nil {
 7	panic(err)
 8}
 9
10// Register instrumentation callbacks
11ocgorm.RegisterCallbacks(db)

After adding this piece of code, you should see something like this:

Jaeger Gorm query

Conclusion

Adding metrics and trace collection to applications using Gin and/or Gorm is fairly simple with OpenCensus and compared to the value it can provide it’s really well worth it. It may require a bit more work in existing code bases to make sure a context is properly propagated everywhere, but the invested time pays off really well.

Further reading

https://opencensus.io/

https://github.com/census-instrumentation/opencensus-specs

https://blog.golang.org/context

https://medium.com/@bas.vanbeek/opencensus-and-go-database-sql-322a26be5cc5

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus