Banzai Cloud Blog

PVC Operator; Creating Persistent Volume on Kubernetes made simple

At Banzai Cloud we work hard on our platform, Pipeline built on Kubernetes. Recently we teamed up with Red Hat and CoreOS to work on Kubernetes Operators using the recently released new Operator SDK and move human operational knowledge into code and we have open sourced quite a few operators already. This blog will dive deep into the PVC Operator. If you are looking for a complete guide how to use the Operator SDK or just interested in Kubernetes Operators, please check our comprehensive guide.

Read more...


Placeholder image

Nandor Kracser

Wed, May 16, 2018

The Banzai Cloud Vault Operator

At Banzai Cloud we are building a feature rich platform, Pipeline - built on top of Kubernetes. With Pipeline we provision large, multi-tenant Kubernetes clusters on all major cloud providers such as AWS, GCP, Azure and BYOC and deploy all kind of predefined or ad-hoc workloads to these clusters. We had to find an industry-standard way for our users to login and interact with secured endpoints and the same time provide dynamic secret management for each application we support.

Read more...


Placeholder image

Toader Sebastian

Mon, May 14, 2018

How to size correctly containers for Java 10 applications

At Banzai Cloud we run and deploy containerized applications to our PaaS, Pipeline. Running Java or JVM based workloads are one of the significant workloads deployed to Pipeline so getting it right is pretty important for us and our users. Java/JVM based workloads on Kubernetes with Pipeline Why my Java application is OOMKilled Deploying Java Enterprise Edition applications to Kubernetes A complete guide to Kubernetes Operator SDK Spark, Zeppelin, Kafka on Kubernetes

Read more...


Placeholder image

Sandor Magyari

Thu, May 10, 2018

Deploying Java Enterprise Edition applications to Kubernetes

Some good years ago, back in the beginning of this century most of us here at Banzai Cloud were in the Java Enterprise business building application servers (BEA Weblogic and JBoss) and lots of JEE applications. Those days are gone, technology stack and landscape has dramatically changed, monolithic applications are out of fashion these days but still we have a lot of them running in production. Because our background we have a kind of personal commitment to help moving Java enterprise edition business applications towards microservices, managed deployments, Kubernetes and the cloud using Pipeline.

Read more...


Placeholder image

Janos Matyas

Tue, May 8, 2018

Banzai Cloud announces collaboration with Red Hat and becomes technology partner

Banzai Cloud announced today that it is collaborating with Red Hat to help standardize the management of complex stateful applications on Kubernetes. Today, Red Hat announced the Operator Framework, an open source toolkit designed to manage application instances on Kubernetes in a more effective, automated, and scalable way. Through the collaboration, Banzai Cloud will work with Red Hat on this open source project, which will focus on creating a new Software Development Kit (SDK) for the “operators” pattern.

Read more...


Placeholder image

Laszlo Puskas

Mon, May 7, 2018

Deploy Node.js applications to Kubernetes

The Pipeline PaaS contains a complete CI/CD component to support developers building, deploying and operating applications in an automated way on Kubernetes. Most of our documentation, blog posts and how-tos were focusing on Spark, Zeppelin and Tensorflow examples, however we can actually build and deploy any application with Pipeline’s CI/CD component. Our last post related to the Banzai Cloud CI/CD flow described how to build/deploy a Spring Boot application on Kuberbetes, this post does the same for a Node.

Read more...


Placeholder image

Marton Sereg

Thu, May 3, 2018

Cloud instance type recommendation

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups Draining Kubernetes nodes A few months ago we had a post about overspending in the cloud where we were discussing how difficult it is to keep track of the vast amount of instance types and pricing options of the cloud providers, especially on AWS with spot pricing.

Read more...


Placeholder image

Toader Sebastian

Tue, May 1, 2018

A complete guide to Kubernetes Operator SDK

At Banzai Cloud we are always looking for new and innovative technologies to support our users with their transition towards microservices deployed to Kubernetes, using Pipeline. In the recent months we have been partnered with CoreOS and RedHat to work on operators and the project it has just been made open source today and available on GitHub. If you read through this blog you’ll learn what is an operator, how to use the operator sdk to develop an operator through a concrete example that we developed and used here at Banzai Cloud.

Read more...


Placeholder image

Janos Matyas

Sun, Apr 29, 2018

Banzai Cloud @ KubeCon, Copenhagen

This week KubeCon + CloudNativeCon, Copanhagen is bringing together over 4000+ developers, architects and people from the cloud native open source communities. We are part of this community and the CNCF landscape as certified experts of these technologies running under the umbrella of the Cloud Naticve Computing Foundation so we could not miss the event. Join us to learn about Kubernetes and the related technologies directly from the the experts of the industry.

Read more...


Placeholder image

Janos Matyas

Fri, Apr 27, 2018

Banzai Cloud is now a Kubernetes Certified Service Provider

We are excited to announce that Banzai Cloud has become a Kubernetes Certified Service Provider (KCSP). The KCSP program was started by the Cloud Native Compute Foundation in collaboration with the Linux Foundation and it is a major milestone to help enterprises move to a cloud native platform. It provides a strict set of rules and certified experts to guarantee that only experienced partners are part of the program. This creates a trust relationship as enterprises can rely on Banzai Cloud and our flagship PaaS, Pipeline bringing in the necessary experience and guide them on their Kubernetes and microservices journey to cloud native application platforms and production usage.

Read more...


Placeholder image

Nandor Kracser

Thu, Apr 26, 2018

The Vault swiss-army knife

Bank Vaults is a thick, tricky, shifty right with a fast and intense tube for experienced surfers only, located on Mentawai. Think heavy steel doors, secret unlocking combinations and burly guards with smack-down attitude. Watch out for clean-up sets. Bank Vaults is a wrapper for the official Vault client with automatic token renewal, built in Kubernetes support, dynamic database credential management, multiple unseal options, automatic re/configuration and more.

Read more...


Placeholder image

Balint Molnar

Wed, Apr 25, 2018

Kubernetes persistent volume options

At Banzai Cloud we push different types of workload to Kubernetes with our open source PaaS, Pipeline. There are lots of deployments we support and have defined the Helm charts however Pipeline is able to deploy applications from any repository. These deployments are pushed on-prem or in the cloud but among many there is one common feature, the need for persistent volumes. The options provided by Kubernetes are abundant and every cloud provider has a custom/additional offering as well.

Read more...


Placeholder image

Sandor Guba

Mon, Apr 23, 2018

Advanced logging on Kubernetes

We continue our series about Kubernetes logging and this post will cover some advanced techniques and visualizations of the collected logs. Just to recap, with our open source PaaS, Pipeline we are monitoring and collecting/moveing large amounts of logs of the distributed applications we push to Kubernetes. We are putting huge efforts to monitor large and federated clusters and automating all these with Pipeline so all our users are getting out of the box monitoring and log collection for free.

Read more...


Placeholder image

Lajos Papp

Fri, Apr 20, 2018

Control your AWS spendings with ChatOps

While we are building our open source, cloud agnostic Heroku / Cloud Foundry like Paas, Pipeline built on top of Kubernetes we launch lots of clusters on different cloud providers. Most of these clusters are launched on spot or preemptible instances and managed by Hollowtrees, however there are many smaller development clusters, control planes, instances and proof of concepts we regularly do and they are marginally related or launched with Pipeline.

Read more...


Placeholder image

Janos Matyas

Wed, Apr 18, 2018

Policy enforcements on K8s with Pipeline

In the past few weeks we have been blogging about the advanced, enterprise-grade security features we are building into our open source PaaS, Pipeline. To recap these features please read the series here: Security series: Authentication and authorization of Pipeline users with OAuth2 and Vault Dynamic credentials with Vault using Kubernetes Service Accounts Dynamic SSH with Vault and Pipeline As you see from the posts above security is extremely important for us and our enterprise users, however at the same time we would like to oversimplify and automate it.

Read more...


Placeholder image

Sandor Magyari

Mon, Apr 16, 2018

Collecting Spark History Server event logs in the cloud

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes

Read more...


Placeholder image

Toader Sebastian

Fri, Apr 13, 2018

Apache Spark application resilience on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Collecting Spark History Server event logs in the cloud

Read more...


Secure logging on Kubernetes with Fluentd and Fluent Bit

As we concluded in the previous blog post we continue this series about centralized and secure Kubernetes logging/log collection.Log messages can contain sensitive information thus it is important to secure the transport between the distributed parts of the log flow. This post describes how we have secured moving log messages on our Kubernetes clusters provisioned by Pipeline. Logging series: Centralized logging under Kubernetes Secure logging on Kubernetes with Fluentd and Fluent Bit

Read more...


Placeholder image

Laszlo Puskas

Tue, Apr 10, 2018

Manage Helm repositories and deploy charts via REST

During the development of our open source Pipeline PaaS, we introduced some handy features to deal with deployments. Note that most of our applications are deployed as Helm releases and we needed a way to interact programatically (using gRPC) and using a UI (RESTful API) with Helm. In order to do that with Pipeline we have introduced a nice feature to be able to manage Helm repositories and deploy applications with Helm to Kubernetes using RESTful API calls.

Read more...


Placeholder image

Marton Sereg

Mon, Apr 9, 2018

Draining Kubernetes nodes

Kubernetes was designed in a way to be fault tolerant to worker node failures. If a node goes missing because of a hardware problem, a cloud infrastructure problem, or in general Kubernetes simply no longer receives heartbeat messages from that node because of any reason, the Kubernetes control plane is clever enough to handle these failures. But it doesn’t mean that it will be able to solve every problem that can happen.

Read more...


Placeholder image

Nandor Kracser

Thu, Apr 5, 2018

Dynamic SSH with Vault and Pipeline

At Banzai Cloud, we are building a feature rich platform as a service, Pipeline - built on Kubernetes. With Pipeline we provision large, multi-tenant Kubernetes clusters on all major cloud providers as AWS, GCP, Azure and BYOC, and deploy all kind of predefined or ad-hoc workloads to these clusters. We needed to find an industry standards-based way for our users to login and interact with protected endpoints and at the same time provide dynamic secrets management for all the different applications we support, all these with native Kubernetes support and we chose to standardize on Vault.

Read more...


Placeholder image

Laszlo Puskas

Tue, Apr 3, 2018

CI/CD for Kubernetes, through a Spring Boot example

The Pipeline PaaS contains a complete CI/CD component to support developers building, deploying and operating applications in an automated way, deployed to Kubernetes. Most of our documentation, blog posts and howtos were focusing on Spark, Zeppelin and Tensorflow examples, however we can actually build and deploy any application with Pipeline’s CI/CD component. This post showcases how to enable a simple Spring Boot application for the Banzai Cloud CI/CD flow, build and save the artifacts and deploy it to a Kubernetes cluster.

Read more...


Placeholder image

Ferenc Hernadi

Fri, Mar 30, 2018

Centralized logging under Kubernetes

For our Pipeline PaaS, monitoring is an essential part of operating distributed applications in production. We are placing large efforts to monitor large and federated clusters and automating all these with Pipeline so all our users are getting out of the box monitoring for free. You can read about our monitoring series below here: Monitoring series: Monitoring Apache Spark with Prometheus Monitoring multiple federated clusters with Prometheus - the secure way Application monitoring with Prometheus and Pipeline Building a cloud cost management system on top of Prometheus Monitoring Spark with Prometheus, reloaded

Read more...


Placeholder image

Janos Matyas

Tue, Mar 27, 2018

The future of big data is Kubernetes

For some time we’ve been evangelizing the idea that the runtime fabric for big data workloads should be Kubernetes. In this post I’d like to walk through the reasoning behind the change and discuss the benefits of it. Obliviously this is a pretty large topic and this post has no intentions to cover it all - also it’s an opinionated view, we at Banzai Cloud believe and push for.

Read more...


Placeholder image

Marton Sereg

Mon, Mar 26, 2018

Fn and Hollowtrees

Adoption of serverless technologies is quickly emerging. According to this survey it is on par with containers. And even though serverless is a very vague term and it can be argued that it is still rarely used in production especially in complex applications, it seems to be sure that it will be one of the most dominant trends in the near future in the cloud computing space. While a few years ago serverless only meant AWS Lambda in its early stages, nowadays the category is maturing rapidly.

Read more...


Placeholder image

Nandor Kracser

Sun, Mar 25, 2018

Secure Kubernetes Deployments with Vault and Pipeline

This is a copy of our guest post we published on the Hashicorp blog about how we use Vault with Kubernetes. At Banzai Cloud, we are building an open source next generation platform as a service, Pipeline - built on Kubernetes. With Pipeline we provision large, multi-tenant Kubernetes clusters on all major cloud providers and deploy different workloads to these clusters. We needed to find an industry standards-based way for our users to publish and interact with protected endpoints and at the same time provide dynamic secrets management for all the different applications we support, all these with native Kubernetes support.

Read more...


Placeholder image

Flora Piszker

Wed, Mar 21, 2018

Pipeline PaaS 0.3.0 - new release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Sandor Guba

Tue, Mar 20, 2018

Kubernetes port hunting

Part of the Debug 101 series, we are back with a small but annoying bug hunting. This kind of bug is not really a bug but the side effect of several tools working together. Here comes the trouble I was deploying a development version of Pipeline on a Kubernetes cluster running on top of an AWS infrastructure. To do this deployment I’ve used the following Helm chart command. $: helm install --name pipeline banzaicloud-stable/pipeline-cp \ --set=drone.

Read more...


Placeholder image

Balint Molnar

Mon, Mar 19, 2018

Kubeless using Kafka on etcd

Kubeless has been designed as a Kubernetes native serverless framework, and for PubSub functions is using Apache Kafka behind the scenes. At Banzai Cloud we like cloud-native technologies, however we were not happy about operating a Zookeeper cluster on Kubernetes, thus we have modified and open sourced a version for Kafka where we have replaced Zookeeper with etcd, which is a better fit. This post is part of the serverless series talking about how to deploy Kubeless using Kafka on etcd with Pipeline and deploy a so called PubSub function.

Read more...


Placeholder image

Janos Matyas

Thu, Mar 15, 2018

Monitoring Spark with Prometheus, reloaded

Monitoring series: Monitoring Apache Spark with Prometheus Monitoring multiple federated clusters with Prometheus - the secure way Application monitoring with Prometheus and Pipeline Building a cloud cost management system on top of Prometheus Monitoring Spark with Prometheus, reloaded At Banzai Cloud we deploy large distributed applications to Kubernetes and operate these clusters as well. We don’t like to get a PagerDuty notification during the night so we try to get ahead of these issues by operating these clusters as efficient as we can.

Read more...


Placeholder image

Nandor Kracser

Wed, Mar 14, 2018

Dynamic credentials with Vault using Kubernetes Service Accounts

Security series: Authentication and authorization of Pipeline users with OAuth2 and Vault Dynamic credentials with Vault using Kubernetes Service Accounts Dynamic SSH with Vault and Pipeline Pipeline is quickly moving towards an as a Service milestone, where the Pipeline PaaS will be available for the masses and early adopters as a hosted service as well (current deployments are all self-hosted). In the previous blog post we showcased how Pipeline uses OAuth2 to authenticate and authorize our Pipeline PaaS users with JWT tokens stored and leased by Vault.

Read more...


Placeholder image

Gabor Kozma

Mon, Mar 12, 2018

Monitoring Apache Kafka with Prometheus

Monitoring series: Monitoring Apache Spark with Prometheus Monitoring multiple federated clusters with Prometheus - the secure way Application monitoring with Prometheus and Pipeline Building a cloud cost management system on top of Prometheus Monitoring Spark with Prometheus, reloaded At Banzai Cloud we provision and monitor large Kubernetes clusters deployed to multiple cloud/hybrid environments using Prometheus. The clusters and the applications or frameworks are all managed by our next generation PaaS, Pipeline.

Read more...


Placeholder image

Toader Sebastian

Wed, Mar 7, 2018

Fn - a container native serverless platform

At Banzai Cloud we are constantly searching products/frameworks to enable in our open source PaaS, Pipeline that adds value to businesses. Serverless frameworks are among those, thus today we are adding Fn as a supported spotguide to make it easy for users to deploy it with Pipeline on their preferred cloud provider. Before we dive into how to deploy and use Fn with Pipeline a few of the reasons why we thought that Fn should be supported by Pipeline:

Read more...


Placeholder image

Sandor Magyari

Mon, Mar 5, 2018

Distributed Tensorflow deployed to Azure AKS Kubernetes using GPU instances

In our last post about distributed TensorFlow we used a research example for distributed training of an Inception model. In this episode we will showcase how to run the same example on GPU instances, this time on Azure managed Kubernetes, AKS deployed with Pipeline. As you might be already familiar with the previous post, among the first things to consider when running distributed Tensorflow models is to have some shared storage available.

Read more...


Placeholder image

Ferenc Hernadi

Tue, Feb 27, 2018

Play With Ingress Authentication

At Banzai Cloud we secure our Kubernetes services using Vault and OAuth2 tokens. This has not always been the case, however we had authentication in the project (even though it was basic) from a very early PoC stage - and suggest all to do so. Usually, inbound connections to Kubernetes cluster services are done via ingress. Just to recap, all public services are typically accessed through a loadbalancer service, however, this can get quite expensive.

Read more...


Placeholder image

Sandor Guba

Mon, Feb 26, 2018

Application monitoring with Prometheus and Pipeline

Monitoring series: Monitoring Apache Spark with Prometheus Monitoring multiple federated clusters with Prometheus - the secure way Application monitoring with Prometheus and Pipeline Building a cloud cost management system on top of Prometheus Monitoring Spark with Prometheus, reloaded At Banzai Cloud we provision and monitor large Kubernetes clusters deployed to multiple cloud/hybrid environments. The clusters and the applications or frameworks are all managed by our next generation PaaS, Pipeline.

Read more...


Placeholder image

Balint Molnar

Wed, Feb 21, 2018

Spark Streaming Checkpointing on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Toader Sebastian

Mon, Feb 19, 2018

Function as a service with OpenFaaS on Banzai Cloud Pipeline

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin, Kafka, Tensorflow, etc to our Pipeline PaaS (built on Kubernetes). Last week we have added serverless capabilities to Pipeline, using OpenFaas. This blog post explains how to deploy OpenFaaS to Kubernetes using Pipeline and invoke an example function running on it. We shall separate the provisioning of the serverless frameworks we support (this post is about OpenFaaS but Pipeline equally supports Kubeless as well) and the invocation of functions through the Pipeline API or CI/CD workflow dispatched to any of the supported serverless frameworks we deploy to Kubernetes.

Read more...


Placeholder image

Laszlo Puskas

Wed, Feb 14, 2018

CI/CD flow for Zeppelin notebooks

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Marton Sereg

Mon, Feb 12, 2018

Diversifying AWS auto-scaling groups, or how to write a Hollowtrees action plugin

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups You may remember the Hollowtrees project we’ve open sourced a few weeks ago - a framework to manage AWS spot instance clusters with a few batteries included: Hollowtrees is an alert-react based framework part of the Pipeline PaaS which coordinates monitoring, applies rules and dispatches action chains towards plugins using standard CNCF interfaces AWS spot instance termination Prometheus exporter AWS autoscaling group Prometheus exporter AWS Spot Instance recommender Kubernetes action plugin to execute k8s operations (e.

Read more...


Placeholder image

Balint Molnar

Thu, Feb 8, 2018

Kafka on Kubernetes - using etcd

At Banzai Cloud we are building a cloud agnostic, open source next generation CloudFoundry/Heroku like PaaS - Pipeline and running several big data workloads natively on Kubernetes. Apache Kafka is one of those cloud native workloads we support out of the box - beside Apache Spark and Apache Zeppelin. In case you are interested in running big data workloads on Kubernetes please read the following blog series as well.

Read more...


Placeholder image

Gabor Kozma

Wed, Feb 7, 2018

Monitoring multiple federated clusters with Prometheus - the secure way

At Banzai Cloud we run multiple Kubernetes clusters deployed with our next generation PaaS, Pipeline and we deploy these clusters across different cloud providers like AWS, Azure, Google or on-prem. These clusters are usually launched using the same control plane deployed either to AWS as a CloudFormation template or Azure as an ARM template and they are running inside a Kubernetes cluster as well (we eat our own dog food).

Read more...


Placeholder image

Marton Sereg

Mon, Feb 5, 2018

Monitor AWS spot instance terminations

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups Last week we have opensourced the Hollowtrees project - a framework to manage AWS spot instance clusters with a few batteries included: Hollowtrees is an alert-react based framework part of the Pipeline PaaS which coordinates monitoring, applies rules and dispatches action chains towards plugins using standard CNCF interfaces AWS spot instance termination Prometheus exporter AWS autoscaling group Prometheus exporter AWS Spot Instance recommender Kubernetes action plugin to execute k8s operations (e.

Read more...


Placeholder image

Toader Sebastian

Thu, Feb 1, 2018

Spark scheduling on Kubernetes demystified

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Tue, Jan 30, 2018

Pipeline PaaS 0.2.0 - new release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Marton Sereg

Mon, Jan 29, 2018

Managing spot instance clusters on Kubernetes with Hollowtrees

Hollowtrees is a wave for the highest level, the pin-up centerfold for the Mentawai islands bringing a new machine-like level to the word perfection. Watch out for the vigilant guardian aptly named The Surgeons Table, whose sole purpose is to take parts of you as a trophy. Hollowtrees, a ruleset based watch-guard is keeping spot instance based clusters safe and allows to use them in production. Handles spot price surges within one region or availability zone and reschedules applications before instances are taking down.

Read more...


Placeholder image

Janos Matyas

Fri, Jan 26, 2018

Authentication and authorization of Pipeline users with OAuth2 and Vault

Security series: Authentication and authorization of Pipeline users with OAuth2 and Vault Dynamic credentials with Vault using Kubernetes Service Accounts Dynamic SSH with Vault and Pipeline Pipeline is quickly moving towards an as a Service milestone, where the Pipeline PaaS will be available for the masses and early adopters as a hosted service as well (current deployments are all self-hosted). The hosted version, as many PaaS offerings will be a multitenant service.

Read more...


Placeholder image

Sandor Magyari

Wed, Jan 24, 2018

Spark application logs - History Server setup on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Flora Piszker

Mon, Jan 22, 2018

The challenges (and resolutions) of working with Azure AKS

We are moving rather fast with new Pipeline features and releases, the second major one scheduled for this week. Among many new features we have added a new managed Kubernetes provider, Microsoft’s Azure AKS. Azure Container Service (AKS) is a preview feature of the Azure Cloud - and we are proud of being very early adopters of it. We can provision and deploy apps to Kubernetes on Azure VMs same as we do on EC2, however at Banzai Cloud we strongly believe that the future is in managed Kubernetes services, and most of our investment regarding cloud neutrality and provisioning is built on managed Kubernetes services both in the cloud (GKE, OCI and ACS in beta or under developent) and on-prem.

Read more...


Placeholder image

Sandor Magyari

Thu, Jan 18, 2018

Introduction to distributed TensorFlow on Kubernetes

Last time we were discussing about how our Pipeline PaaS is deploying and provisioning an AWS EFS filesystem on Kubernetes and what are the performance benefits for Spark or TensorFlow. This post is about: Introduction to TensorFlow on Kubernetes Benefits of EFS for TensorFlow (store image data for TensorFlow jobs) Pipeline uses the kubeflow framework to deploy: A JupyterHub to create & manage interactive Jupyter notebooks A TensorFlow Training Controller that can be configured to use CPUs or GPUs A TensorFlow Serving container Note that beside the ones above Pipeline also has default Spotguides for Spark and Zeppelin as well to support your datascience experience

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 15, 2018

Amazon Elastic File System on Kubernetes

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin and most recently Tensorflow, all running on our Pipeline PaaS (built on Kubernetes). One of Pipeline’s early adopter is running a Tensorflow Training Controller using GPUs on AWS EC2 wired into our CI/CD pipeline and needed significant parallelization for reading training data. We have introduced support for Amazon Elastic File System and will make it publicly available in the forthcoming release of Pipeline.

Read more...


Placeholder image

Janos Matyas

Thu, Jan 11, 2018

Running TiDB on Kubernetes

At Banzai Cloud we provision different applications or frameworks to our PaaS - Pipeline, built on Kubernetes. At the same time we eat our own dogfood and the PaaS’ control plane itself is running on Kubernetes and needs a data storage layer. So we needed to cover two use cases - deploy and run a distributed, scalable and fully SQL compliant DB to cover our client’s and our own internal needs.

Read more...


Placeholder image

Toader Sebastian

Tue, Jan 9, 2018

Why my Java application is OOMKilled

At Banzai Cloud we run and deploy containerized applications to our PaaS, Pipeline. Like us, those who already ran Java application inside Docker have probably came across the problem of the JVM incorrectly detecting the available memory when running inside of the container. The JVM rather sees the available memory of the machine instead of the memory available only to the Docker container. This can lead to cases where applications running inside the container is killed when tries to use more memory beyond the limits of the Docker container.

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 8, 2018

Running Zeppelin Spark notebooks on Kubernetes - deep dive

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Thu, Jan 4, 2018

Take a rest - enjoy the REST

Modern applications and services usually expose their functionalities via REST; moreover modules and components also can make use of external services that again are exposed as REST. Thus developers often need to design RESTful services and write REST service clients. This kind of work implies calling these services thousands of times during the development process (developers need to understand the API, the messages and the resources involved) and even after it to make sure everything works as desired.

Read more...


Placeholder image

Toader Sebastian

Tue, Jan 2, 2018

The anatomy of Spark applications on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Wed, Dec 27, 2017

Top 3 blogs of 2017 and what’s next

As 2017 comes to an end, we are looking back at the top three blog posts that were most popular with our readers. We can’t really look too far back (though we had 13 posts and one release already) as we basically started our startup just a little bit over one month (November 20, 2017 to be more precise) but during this short time period we achieved quite a lot and laid the foundation to some exciting new projects we plan to ship early next year.

Read more...


Placeholder image

Balint Molnar

Thu, Dec 21, 2017

Debugging a jetcd Txn Bug

This post is part of the Debug 101 series - if you missed the previoius one check it here: Nodes successfully joined, not! We are in the middle of deploying Apache Kafka to Kubernetes the cloud native way - by totally removing the Zookeeper dependency and using etcd instead. All service registry/discovery and other internal Kafka to Zookeeper operations are dispatched to the already existing etcd cluster. Sweet, isn’t it - no need to yet another third party system as we already have etcd part of Kubernetes out of the box.

Read more...


Placeholder image

Miklos Csendes

Wed, Dec 20, 2017

Introduction to spotguides

Last week we have released the first version of Pipeline - with end to end support for cloud native apps starting from a GitHub commit hook deployed into the cloud in minutes using a fully customizable CI/CD workflow. The core part of the Pipeline PaaS is spotguides - a collection of workflow/pipeline steps defined in a .pipeline.yml file and a few Drone plugins. In this post we would like to demystify spotguides and describe step by step how they work; the next post will be a tutorial of how to write a custom spotguide and an associated plugin.

Read more...


Placeholder image

Balint Molnar

Mon, Dec 18, 2017

Monitoring Apache Spark with Prometheus on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Thu, Dec 14, 2017

Apache Spark CI/CD workflow howto

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Tue, Dec 12, 2017

Pipeline PaaS - the first release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Marton Sereg

Thu, Dec 7, 2017

Overspending in the cloud

Cloud cost management series: Overspending in the cloud Managing spot instance clusters on Kubernetes with Hollowtrees Monitor AWS spot instance terminations Diversifying AWS auto-scaling groups Draining Kubernetes nodes One of the main advantages that is always brought up when debating whether it’d be good to move a deployment to the cloud is cost. There are no upfront costs in the cloud because you don’t have to buy the hardware, and you’ll only pay for what you really use because you can scale your infrastructure based on your workloads.

Read more...


Placeholder image

Sandor Magyari

Tue, Dec 5, 2017

Running Zeppelin Spark notebooks on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Mon, Dec 4, 2017

Banzai Cloud @ KubeCon + CloudNativeCon, North America

This week KubeCon + CloudNativeCon, North America is bringing together over 2500+ developers, architects and people from the cloud native open source communities. We are part of this community by contributing and using these technologies running under the umbrella of the Cloud Naticve Computing Foundation so we could not miss the event. Join us to learn about Kubernetes and the related technologies directly from the the experts of the industry.

Read more...


Placeholder image

Toader Sebastian

Fri, Dec 1, 2017

Scaling Spark made simple on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Sandor Magyari

Wed, Nov 29, 2017

Nodes successfully joined, not!

Debug 101 Today we are starting a new series called Debug 101 - dealing with issues which gave us significant headaches and we spent lots of time to debug, understand and fix the problems. We strongly believe in open source software and open issue resolution and we try to describe the problems and suggest fixes, thus you don’t have to shave that yak. We already did, and it looks awesome.

Read more...


Placeholder image

Janos Matyas

Mon, Nov 27, 2017

Introduction to Spark on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Flora Piszker

Fri, Nov 24, 2017

Azure Managed Kubernetes (AKS) Go SDK

At Banzai Cloud we are using different cloud providers or managed Kubernetes offerings and one of these clusters we use is Microsoft Azure Managed Kubernetes. It is a pretty neat service and gives you a managed K8S cluster without the need of dealing with low level Kubernetes building blocks or tooling, nor starting with cloud infrastructure provisioning. However there is one temporary issue which is cornerstone for our PasS, Pipeline - the Azure Go-SDK does not contain the bindings for this new service.

Read more...


Placeholder image

Janos Matyas

Thu, Nov 23, 2017

The company I'd like to work for

While I had no intention to make or join a new startup (after a successful exit which was a good financial decision but turned out to be the worst professional one) a few former co-founders from SequenceIQ and friends I have been working together at Fathom Technology/Epam Systems approached me after I got back home from my pretty long surfing trip. Few of them moved to work on a project for a banking giant to do microservice based Java applications scheduled with Nomad.

Read more...