Placeholder image

Marton Sereg

Mon, Feb 12, 2018


Diversifying AWS auto-scaling groups, or how to write a Hollowtrees action plugin

Cloud cost management series:
Overspending in the cloud
Managing spot instance clusters on Kubernetes with Hollowtrees
Monitor AWS spot instance terminations
Diversifying AWS auto-scaling groups
Draining Kubernetes nodes
Cluster recommender
Cloud instance type and price information as a service

You may remember the Hollowtrees project we’ve open sourced a few weeks ago - a framework to manage AWS spot instance clusters with a few batteries included:

  • Hollowtrees is an alert-react based framework part of the Pipeline PaaS which coordinates monitoring, applies rules and dispatches action chains towards plugins using standard CNCF interfaces
  • AWS spot instance termination Prometheus exporter
  • AWS autoscaling group Prometheus exporter
  • AWS Spot Instance recommender
  • Kubernetes action plugin to execute k8s operations (e.g. graceful drain, rescheduling)
  • AWS autoscaling group plugin to replace instances with better cost or stability characteristics

Hollowtrees

Last week we’ve introduced the spot instance termination exporter for Prometheus. In this post we’re going to deep dive into one of the other core components - the AWS Autoscaling action plugin. The action plugin (and the other components) can be used independently of Pipeline or Hollowtrees.

tl;dr

We wanted to use spot instances in our long-running AWS Kubernetes clusters and to do that we needed to find a way to use auto scaling groups with multiple instance types and with different bid prices. To be able to do that we’ve created a project that listens on a gRPC interface and can be instructed to swap a running instance in an auto scaling group to another with different cost or stability characteristics. It diversifies the instance types in the group and reduces the chances of having critical failures in the cluster.

Problems with spot instance clusters

Using spot instances can be useful for fault-tolerant workloads where it doesn’t matter if (some of) the instances are taken away. When using spot instances in auto scaling groups the naive approach is to set an instance type and a spot price in the launch configuration. But it means that even if your auto scaling group is spread across availability zones, a very large part of the instances are running in the same spot market and therefore there is a very large chance that they will be taken away at the same time if the current spot price surges above the bid price so it can have a critical impact on the cluster. For the same reason AWS came up with Spot Fleets but spot fleet is a completely different service with a different API that doesn’t have the same level of integration to other AWS services (like ELB).

If you’re not familiar with spot instances, make sure to read our previous blog post first that recaps the lifecycle of spot instances and describes the different ways of how to request spot instances, or for a more in-depth guide read the corresponding part of the AWS documentation

Swapping instances

AWS auto scaling groups can use only one launch configuration that describes the instance type and the spot price of all the instances in the group. So if you want to use multiple types you’ll need to use a little trick: the AWS API allows detaching and attaching instances to a group regardless of its instance type and other configuration. The business logic in this plugin is using this trick to swap existing instances in the cluster to different instances. Let’s see the steps in the code:

  1. Based on the incoming event info the plugin fetches the information of the auto scaling group and the corresponding launch configuration.
  2. An instance type recommendation is requested from the Banzai Cloud spot instance recommender through an HTTP API. More on the recommender in another blog post soon.
  3. The original instance is detached from the auto scaling group.
  4. Based on the recommendation a new spot instance is requested from AWS and the code waits until the instance is started. The launch specification of the new instance, like the image ID, EBS volumes, SSH keys, etc. are copied from the auto scaling group and its related launch configuration.
  5. The instance is attached to the auto scaling group.
  6. If requested the detached instance is terminated. Termination is optional, for example if a spot instance is swapped because the termination notice arrived, it’ll be terminated by AWS itself after 2 minutes and during this time some draining scripts may be running so it doesn’t make sense to shut it down beforehand.

Hollowtrees action plugin interface

This project is a Hollowtrees action plugin, so its gRPC interface follows the standard Hollowtrees plugin interface. This interface is using an event-based schema and it’s described in a simple .proto file in the Hollowtrees repository, here. The interface is very simple, it has only one rpc function where the request payload is the AlertEvent message that follows the structure of the current CloudEvents specification. Routing inside the plugin is done via the EventType of the message.

We’ve implemented a very simple “SDK” for this interface. To use that the plugin needs to do only two things:

  1. Implement the AlertHandler interface’s Handle method with any custom behaviour. In this case it calls the instance swap mechanism with the payload from the event.
  2. Call the Serve() function from the SDK with the new AlertHandler. It starts the gRPC server that listens on the configured port.

Try it out

Build and run

To keep things simple, building the project is as simple as running a go build command. The result is a statically linked executable binary.

go build .

The following options can be configured when starting the binary. Configuration is done through a plugin-config.toml file that can be placed in the ./conf/ directory or near the binary. Notes: * Currently the application can only interact with one AWS region, so if you’d like to use it for multiple regions you’ll need to run an instance of the application in every used region. * The project is using the default AWS go client and access credentials are configured through the default client. It means that instance profile, configuration files in the home directory (~/.aws/credentials) and environment variables can all be used.

[log]
    format = "text"
    level = "debug"

[plugin]
    port = "8888"
    region = "eu-west-1"

To run the project, simply run the executable binary from the build.

Deploy it to a Kubernetes cluster

At Banzai Cloud all our deployments are running inside Kubernetes. We use the standard Helm package manager but all our deployments are using Pipeline - we made Helm deployments available over a RESTful API as well. The charts are available at our GitHub charts repository.

To install the exporter’s chart with the release name aws-asg-action-plugin:

$ helm install --name aws-asg-action-plugin banzaicloud-incubator/ht-aws-asg-action-plugin

The helm chart will deploy a configmap for the configuration file, a pod for the action plugin itself and a service to be able to reach the gRPC interface.

Test it

To test the plugin, you’ll need to write your client that calls the gRPC interface, or use the Hollowtrees engine and configure some rules on when to swap an instance, or if you just want to play around, use a tool like grpcc to connect the the plugin and send some basic events.

Future plans

This plugin can be considered an early alpha version. There are important things missing like security on the gRPC interface, things to improve like the recommendation engine that’s being called to figure out the new instance type and bid price, and new features to always keep the auto scaling group in a stable state like cooldown periods, or keeping a percentage of the cluster on on-demand instances.

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus