Banzai Cloud Logo Close
Home Benefits Blog Company Contact
Sign in
Author Miklos Csendes

Introduction to spotguides

TRY PIPELINE FOR FREE

Last week we released the first version of Pipeline - a PaaS with end to end support for cloud native apps, from GitHub commit hooks deployed to the cloud in minutes to the use of a fully customizable CI/CD workflow.

At the core of the Pipeline PaaS are its spotguides - a collection of workflow/pipeline steps defined in a .pipeline.yml file and a few Drone plugins. In this post we’d like to demystify spotguides and describe, step by step, how they work; the next post will be a tutorial on how to write a custom spotguide and its associated plugin.

From a distance each spotguide is just a customizable CI/CD pipeline defined in a yaml file, a plugin written in Golang and a Docker container that can be deployed/executed.

Building blocks

Pipeline

Pipeline is an API and execution engine that provisions Kubernetes clusters and container engines in the cloud, deploys applications and is independent of the application/spotguide it deploys - the same way Kubernetes is. Any application that can be packaged as a Docker container and has a manifest file or helm chart can be deployed to a supported cloud provider or on-prem Kubernetes cluster and managed by Pipeline. This is true of applications like Apache Spark, Kafka and Zeppelin, but, at the same time, Pipeline is not tied to big data workloads (it’s a generic microservice platform) and supports applications like Java (with cgroups) and distributed and resilient databases (exposing mysql and postgres wire protocols as a service).

At the risk of over-simplifying things, the Pipeline API is just one execution step in a CI/CD workflow - is actually governed by the Pipeline CI/CD plugin.

Default plugins

There are a few well defined out-of-the-box plugins that are already part of the Drone CI/CD component. Any complete list of those plugins would be quite large, but, just to highlight a few, some that we frequently use are:

  • Docker - a plugin to build and publish Docker images to a container registry
  • git - a plugin that clones git repositories
  • s3 cache/sync - a plugin that caches build artifacts to S3 compatible storage backends like Minio or Rook, and syncs files with a bucket
  • azure/google storage - a plugin for publishing files to Azure and Google blob storage
  • dockerhub - a plugin to trigger a remote Docker Hub build
  • slack - a plugin for Slack notifications

Most of these plugins require a credential or a keypair to access, and manage remote resources. The CI/CD system supports a convenient way to pass secrets (like passwords and ssh keys), without the need to actually place them alongside a workflow definition and store them in GitHub. You can do this either by using the API or the CLI, or by passing them into the plugin at runtime like ENV variables, or, if you’re running Kubernetes (like we do), through secrets or config maps.

Custom plugins

Spotguides are application specific. The pipeline/workflow steps described in the .pipeline.yml file reflect the typical lifecycle of the application, and are you usually unique. Needles to say, the CI/CD workflow/pipeline is fully customizable and supports parallel or conditional execution. Custom plugins sit at the core of any spotguide. We’ve written custom plugins for our default supported apps; these plugins are extremely simple to build (they usually take 1-2 days) and have well defined interfaces. Variable injection, execution as a container, security, etc are all out of the realm of concern for a plugin’s writer - these are default services you already get from the CI/CD engine. By way of an example, take a look at our Apache Spark spotguide. This is how you get from a GitHub commit hook to a running Spark application on Kubernetes in minutes. The overall flow looks like this:

This flow translates to the following plugin flow:

The building blocks for the Spark spotguide are as follows:

Component Source code
Spark RSS Helm charts https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark-rss
Spark Shuffle Helm charts https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark-shuffle
Spark Helm charts https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark
K8S Proxy plugin https://github.com/banzaicloud/drone-plugin-k8s-proxy
Spark K8S submit plugin https://github.com/banzaicloud/drone-plugin-spark-submit-k8s
Pipeline client plugin https://github.com/banzaicloud/drone-plugin-pipeline-client

This combination of plugins written in Golang, the .pipeline.yml file and Kubernetes deployment definitions (Helm charts in our case) composes a spotguide. As you can see, spotguides are application specific. However, the platform that deploys and governs them - Pipeline - is agnostic. This is an easy and powerful way to integrate any distributed application that can be containerized so it will run on our microservice PaaS. Pipeline creates and defines the runtime - which is Kubernetes - and deploys the application - which are described by Helm charts - through a REST API.

Helm charts

We use Helm charts to deploy and orchestrate the applications we deploy. In order to write a spotguide, you’ll need a Helm chart (or a low level deployment k8s unit like a manifest) and some orchestration logic (maybe). Take, for instance, one of the examples we deploy and use - a distributed database. Kubernetes does not differentiate between resources and priorities when deploying applications. Helm charts do have dependencies but there is no ordering. Because Helm 3.0 has so far not been released, we provide default init containers for a predefined number of protocols to allow ordering and higher level readiness probes. Such basic ordering is a database startup; if you’re deploying a simple web app with Pipeline that requires a database, it is deployed in parallel, however, the web app will fail until the database starts, is initiated and is ready to serve requests. These request failures show up in the logs, and trace and potentially trigger the default Prometheus alerts we deploy for the application. This is not ideal. But k8s does not currently have an out-of-the-box solution (at least not until Helm 3.0 is released). Thus, we provide protocol specific init containers that are able to serve startup orders, initialize applications and send readiness probes.

.pipeline.yml

The final piece of this equation is the yaml file. The pipeline.yml connects these components (except the upcoming UI and CLI) in a single unit, and describes workflow steps, defines the underlying plugins and their associated Helm charts. The yaml is pretty simple to read, maintain and execute. One added benefit is that, since all the steps above are containerized (plugins, for example), they can be used with other commercial CI/CD systems like CircleCI or Travis.

If you’re interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star

TRY PIPELINE FOR FREE

Comments

comments powered by Disqus