Banzai Cloud Logo Close
Home Products Benefits Blog Company Contact
Get Started
Author Zsolt Varga

Kubernetes on the spot

One of the main features of the Banzai Cloud Pipeline platform is that it allows enterprises to run cost effective workloads by mixing spot and preemptible instances with regular ones, and without sacrificing overall reliability.

First, let’s dig into some of the components that make spot instances so reliable, then we’ll provide an example of a Pipeline control plane installation, submit some workloads and simulate a spot instance termination.

Availability in spot-instance clusters

EKS or PKE clusters launched with Pipeline can mix in spot and on-demand instances (similarly, GKE can do this with preemptible instances). Such a cluster can be very volatile, instances — and therefore pods and deployments — may come and go, so it’s generally considered risky to run workloads or services on these types of clusters. Nevertheless, clusters started with Pipeline have some special fail-safes and custom features that help maintain high availability while still allowing users to benefit from the low cost of spot instances.

  1. Telescopes is used to recommend a diverse set of node pools. It helps decrease the chance of a large number of instances being interrupted at once by mixing instances across different spot markets.
  2. Cloudinfo supplies up-to-date service and price details.
  3. Deployments are scheduled in the cluster such that a configurable/fixed percentage of replicas are always on on-demand instances, so even if there’s a serious spot instance outage, deployments remain available with a reduced number of replicas. This is achieved through a custom scheduler, which takes node labels and specific pod annotations into account when running their predicates against nodes.
  4. Metrics relating to spot-related events are collected in Prometheus via Pipeline and through different exporters, like termination notices, fulfillment times and current market prices.
  5. Spot instance terminations are handled properly through Hollowtrees, which drains interrupted nodes and replaces them with nodes in safe spot markets.
  6. Spot instance scheduler (and webhooks)

Termination Flow

Showtime

Let’s take a look at how you can deploy your applications to a mixture of on-demand and spot instances, and how the automatic failover is triggered when spot instances are removed from the cluster. Before you start, you’ll need a Pipeline platform — and for that you have two options:

For the sake of the demo, let’s try it by spinning up our Pipeline platform control plane on EC2.

Create the Pipeline control plane on EC2

Go grab the Banzai CLI with your preferred package (Debian, RPM, binary tarballs for Linux and macOS) and follow the installation guide.

❯ banzai pipeline init --provider ec2 --workspace demo
​
❯ echo "
hollowtrees:
  enabled: true
" >> ~/.banzai/pipeline/demo/values.yaml

Add your AWS secrets to the control plane

❯ banzai secret create --magic

Create an EKS cluster with Hollowtrees enabled

Now let’s create a cluster with our spot instance watchdog, Hollowtrees.

❯ banzai cluster create

Now let’s add a deployment, and configure the spread of its nodes accross spot and on-demand instances using the Pipeline UI.

Diversifying spot instances

We’ve been running thousands of K8s clusters on spot instances, and the generally accepted wisdom is that AWS removes spot instances due to (lack of) capacity, not due to price fluctuations. As highlighted at the beginning of this post, Telescopes recommends instance types based on resource needs, and can recommend similar instance types — whether on a totally different spot market or of a totally different flavour — to make sure that we meet and maintain the correct level of resources (set at cluster creation).

Test node termination

To test node termination choose one node, find the instance termination handler pod which runs on that node, port-forward its 8081 port and send a PUT /terminate request to it:

❯ kubectl -n pipeline-system port-forward ith-instance-termination-handler-jwtqn 8081:8081

In another terminal:

❯ curl -X PUT http://localhost:8081/terminate

The test termination drains the node, cordons it, removes it from the ASG, but the instance itself must be terminated manually! Obviously, in the event there’s a real spot termination, AWS will remove it for you.

Now let’s see what happens when we remove (send a termination notice to) a node. This visualization details how the node is cordoned and drained, while a new node is simultaneously being automatically provisioned. Once the new node has joined the cluster, the scheduler reschedules the pods.

For visualizing nodes/deployments we have used k8s-visualizer

Conclusion

Spot instances provide spare EC2 compute capacity at a discount of up to 80% when compared to on-demand prices, so it definitely makes sense to use them. The fail-safes and tools we have built into Pipeline make it so that you can begin taking advantage of these instances in the production environment, let alone in Dev and QA.

Happy savings!

About Pipeline

Banzai Cloud’s Pipeline provides a platform which allows enterprises to develop, deploy and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operations teams alike. Strong security measures — multiple authentication backends, fine-grained authorization, dynamic secret management, automated secure communications between components using TLS, vulnerability scans, static code analysis, CI/CD, etc. — are a tier zero feature of the Pipeline platform, which we strive to automate and enable for all enterprises.

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:


Comments

comments powered by Disqus