Banzai Cloud Blog

Placeholder image

Balint Molnar

Wed, Feb 21, 2018

Spark Streaming Checkpointing on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Toader Sebastian

Mon, Feb 19, 2018

Function as a service with OpenFaaS on Banzai Cloud Pipeline

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin, Kafka, Tensorflow, etc to our Pipeline PaaS (built on Kubernetes). Last week we have added serverless capabilities to Pipeline, using OpenFaas. This blog post explains how to deploy OpenFaaS to Kubernetes using Pipeline and invoke an example function running on it. We shall separate the provisioning of the serverless frameworks we support (this post is about OpenFaaS but Pipeline equally supports Kubeless as well) and the invocation of functions through the Pipeline API or CI/CD workflow dispatched to any of the supported serverless frameworks we deploy to Kubernetes.

Read more...


Placeholder image

Laszlo Puskas

Wed, Feb 14, 2018

CI/CD flow for Zeppelin notebooks

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Marton Sereg

Mon, Feb 12, 2018

Diversifying AWS auto-scaling groups, or how to write a Hollowtrees action plugin

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups You may remember the Hollowtrees project we’ve open sourced a few weeks ago - a framework to manage AWS spot instance clusters with a few batteries included: Hollowtrees is an alert-react based framework part of the Pipeline PaaS which coordinates monitoring, applies rules and dispatches action chains towards plugins using standard CNCF interfaces AWS spot instance termination Prometheus exporter AWS autoscaling group Prometheus exporter AWS Spot Instance recommender Kubernetes action plugin to execute k8s operations (e.

Read more...


Placeholder image

Balint Molnar

Thu, Feb 8, 2018

Kafka on Kubernetes - using etcd

At Banzai Cloud we are building a cloud agnostic, open source next generation CloudFoundry/Heroku like PaaS - Pipeline and running several big data workloads natively on Kubernetes. Apache Kafka is one of those cloud native workloads we support out of the box - beside Apache Spark and Apache Zeppelin. In case you are interested in running big data workloads on Kubernetes please read the following blog series as well.

Read more...


Placeholder image

Gabor Kozma

Wed, Feb 7, 2018

Monitoring multiple federated clusters with Prometheus - the secure way

At Banzai Cloud we run multiple Kubernetes clusters deployed with our next generation PaaS, Pipeline and we deploy these clusters across different cloud providers like AWS, Azure, Google or on-prem. These clusters are usually launched using the same control plane deployed either to AWS as a CloudFormation template or Azure as an ARM template and they are running inside a Kubernetes cluster as well (we eat our own dog food).

Read more...


Placeholder image

Marton Sereg

Mon, Feb 5, 2018

Monitor AWS spot instance terminations

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups Last week we have opensourced the Hollowtrees project - a framework to manage AWS spot instance clusters with a few batteries included: Hollowtrees is an alert-react based framework part of the Pipeline PaaS which coordinates monitoring, applies rules and dispatches action chains towards plugins using standard CNCF interfaces AWS spot instance termination Prometheus exporter AWS autoscaling group Prometheus exporter AWS Spot Instance recommender Kubernetes action plugin to execute k8s operations (e.

Read more...


Placeholder image

Toader Sebastian

Thu, Feb 1, 2018

Spark scheduling on Kubernetes demystified

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Tue, Jan 30, 2018

Pipeline PaaS 0.2.0 - new release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Marton Sereg

Mon, Jan 29, 2018

Managing spot instance clusters on Kubernetes with Hollowtrees

Hollowtrees is a wave for the highest level, the pin-up centerfold for the Mentawai islands bringing a new machine-like level to the word perfection. Watch out for the vigilant guardian aptly named The Surgeons Table, whose sole purpose is to take parts of you as a trophy. Hollowtrees, a ruleset based watch-guard is keeping spot instance based clusters safe and allows to use them in production. Handles spot price surges within one region or availability zone and reschedules applications before instances are taking down.

Read more...


Placeholder image

Janos Matyas

Fri, Jan 26, 2018

Authentication and authorization of Pipeline users with OAuth2 and Vault

Pipeline is quickly moving towards an as a Service milestone, where the Pipeline PaaS will be available for the masses and early adopters as a hosted service as well (current deployments are all self-hosted). The hosted version, as many PaaS offerings will be a multitenant service. The resource and performance isolation of tenants will be handled by the underlying platform/core building block - Kubernetes (this topic deserves a post on its own).

Read more...


Placeholder image

Sandor Magyari

Wed, Jan 24, 2018

Spark application logs - History Server setup on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Flora Piszker

Mon, Jan 22, 2018

The challenges (and resolutions) of working with Azure AKS

We are moving rather fast with new Pipeline features and releases, the second major one scheduled for this week. Among many new features we have added a new managed Kubernetes provider, Microsoft’s Azure AKS. Azure Container Service (AKS) is a preview feature of the Azure Cloud - and we are proud of being very early adopters of it. We can provision and deploy apps to Kubernetes on Azure VMs same as we do on EC2, however at Banzai Cloud we strongly believe that the future is in managed Kubernetes services, and most of our investment regarding cloud neutrality and provisioning is built on managed Kubernetes services both in the cloud (GKE, OCI and ACS in beta or under developent) and on-prem.

Read more...


Placeholder image

Sandor Magyari

Thu, Jan 18, 2018

Introduction to distributed TensorFlow on Kubernetes

Last time we were discussing about how our Pipeline PaaS is deploying and provisioning an AWS EFS filesystem on Kubernetes and what are the performance benefits for Spark or TensorFlow. This post is about: Introduction to TensorFlow on Kubernetes Benefits of EFS for TensorFlow (store image data for TensorFlow jobs) Pipeline uses the kubeflow framework to deploy: A JupyterHub to create & manage interactive Jupyter notebooks A TensorFlow Training Controller that can be configured to use CPUs or GPUs A TensorFlow Serving container Note that beside the ones above Pipeline also has default Spotguides for Spark and Zeppelin as well to support your datascience experience

Read more...


Placeholder image

Laszlo Puskas

Wed, Jan 17, 2018

Helm via REST; side effects of building the Pipeline PaaS

Two months ago we set sails to build a next generation Heroku/CloudFoundry like PaaS on top of Kubernetes, called Pipeline. The PaaS itself contains an end-to-end CI/CD pipeline triggered by commit hooks, several spotguides for Spark, Zeppelin, Kafka, databases and Java apps running on Kubernetes the cloud native way and a bunch of out of the box features as Prometheus based monitoring, pre-configured Grafana dashboards, service mesh and many others.

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 15, 2018

Amazon Elastic File System on Kubernetes

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin and most recently Tensorflow, all running on our Pipeline PaaS (built on Kubernetes). One of Pipeline’s early adopter is running a Tensorflow Training Controller using GPUs on AWS EC2 wired into our CI/CD pipeline and needed significant parallelization for reading training data. We have introduced support for Amazon Elastic File System and will make it publicly available in the forthcoming release of Pipeline.

Read more...


Placeholder image

Janos Matyas

Thu, Jan 11, 2018

Running TiDB on Kubernetes

At Banzai Cloud we provision different applications or frameworks to our PaaS - Pipeline, built on Kubernetes. At the same time we eat our own dogfood and the PaaS’ control plane itself is running on Kubernetes and needs a data storage layer. So we needed to cover two use cases - deploy and run a distributed, scalable and fully SQL compliant DB to cover our client’s and our own internal needs.

Read more...


Placeholder image

Toader Sebastian

Tue, Jan 9, 2018

Why my Java application is OOMKilled

At Banzai Cloud we run and deploy containerized applications to our PaaS, Pipeline. Like us, those who already ran Java application inside Docker have probably came across the problem of the JVM incorrectly detecting the available memory when running inside of the container. The JVM rather sees the available memory of the machine instead of the memory available only to the Docker container. This can lead to cases where applications running inside the container is killed when tries to use more memory beyond the limits of the Docker container.

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 8, 2018

Running Zeppelin Spark notebooks on Kubernetes - deep dive

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Thu, Jan 4, 2018

Take a rest - enjoy the REST

Modern applications and services usually expose their functionalities via REST; moreover modules and components also can make use of external services that again are exposed as REST. Thus developers often need to design RESTful services and write REST service clients. This kind of work implies calling these services thousands of times during the development process (developers need to understand the API, the messages and the resources involved) and even after it to make sure everything works as desired.

Read more...


Placeholder image

Toader Sebastian

Tue, Jan 2, 2018

The anatomy of Spark applications on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Wed, Dec 27, 2017

Top 3 blogs of 2017 and what’s next

As 2017 comes to an end, we are looking back at the top three blog posts that were most popular with our readers. We can’t really look too far back (though we had 13 posts and one release already) as we basically started our startup just a little bit over one month (November 20, 2017 to be more precise) but during this short time period we achieved quite a lot and laid the foundation to some exciting new projects we plan to ship early next year.

Read more...


Placeholder image

Balint Molnar

Thu, Dec 21, 2017

Debugging a jetcd Txn Bug

This post is part of the Debug 101 series - if you missed the previoius one check it here: Nodes successfully joined, not! We are in the middle of deploying Apache Kafka to Kubernetes the cloud native way - by totally removing the Zookeeper dependency and using etcd instead. All service registry/discovery and other internal Kafka to Zookeeper operations are dispatched to the already existing etcd cluster. Sweet, isn’t it - no need to yet another third party system as we already have etcd part of Kubernetes out of the box.

Read more...


Placeholder image

Miklos Csendes

Wed, Dec 20, 2017

Introduction to spotguides

Last week we have released the first version of Pipeline - with end to end support for cloud native apps starting from a GitHub commit hook deployed into the cloud in minutes using a fully customizable CI/CD workflow. The core part of the Pipeline PaaS is spotguides - a collection of workflow/pipeline steps defined in a .pipeline.yml file and a few Drone plugins. In this post we would like to demystify spotguides and describe step by step how they work; the next post will be a tutorial of how to write a custom spotguide and an associated plugin.

Read more...


Placeholder image

Balint Molnar

Mon, Dec 18, 2017

Monitoring Apache Spark with Prometheus on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Thu, Dec 14, 2017

Apache Spark CI/CD workflow howto

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Tue, Dec 12, 2017

Pipeline PaaS - the first release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Marton Sereg

Thu, Dec 7, 2017

Overspending in the cloud

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups One of the main advantages that is always brought up when debating whether it’d be good to move a deployment to the cloud is cost. There are no upfront costs in the cloud because you don’t have to buy the hardware, and you’ll only pay for what you really use because you can scale your infrastructure based on your workloads.

Read more...


Placeholder image

Sandor Magyari

Tue, Dec 5, 2017

Running Zeppelin Spark notebooks on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Mon, Dec 4, 2017

Banzai Cloud @ KubeCon + CloudNativeCon, North America

This week KubeCon + CloudNativeCon, North America is bringing together over 2500+ developers, architects and people from the cloud native open source communities. We are part of this community by contributing and using these technologies running under the umbrella of the Cloud Naticve Computing Foundation so we could not miss the event. Join us to learn about Kubernetes and the related technologies directly from the the experts of the industry.

Read more...


Placeholder image

Toader Sebastian

Fri, Dec 1, 2017

Scaling Spark made simple on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Sandor Magyari

Wed, Nov 29, 2017

Nodes successfully joined, not!

Debug 101 Today we are starting a new series called Debug 101 - dealing with issues which gave us significant headaches and we spent lots of time to debug, understand and fix the problems. We strongly believe in open source software and open issue resolution and we try to describe the problems and suggest fixes, thus you don’t have to shave that yak. We already did, and it looks awesome.

Read more...


Placeholder image

Janos Matyas

Mon, Nov 27, 2017

Introduction to Spark on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Flora Piszker

Fri, Nov 24, 2017

Azure Managed Kubernetes (AKS) Go SDK

At Banzai Cloud we are using different cloud providers or managed Kubernetes offerings and one of these clusters we use is Microsoft Azure Managed Kubernetes. It is a pretty neat service and gives you a managed K8S cluster without the need of dealing with low level Kubernetes building blocks or tooling, nor starting with cloud infrastructure provisioning. However there is one temporary issue which is cornerstone for our PasS, Pipeline - the Azure Go-SDK does not contain the bindings for this new service.

Read more...


Placeholder image

Janos Matyas

Thu, Nov 23, 2017

The company I'd like to work for

While I had no intention to make or join a new startup (after a successful exit which was a good financial decision but turned out to be the worst professional one) a few former co-founders from SequenceIQ and friends I have been working together at Fathom Technology/Epam Systems approached me after I got back home from my pretty long surfing trip. Few of them moved to work on a project for a banking giant to do microservice based Java applications scheduled with Nomad.

Read more...