PVC Operator; Creating Persistent Volume on Kubernetes made simple

At Banzai Cloud we work hard on our platform, Pipeline built on Kubernetes. Recently we teamed up with Red Hat and CoreOS to work on Kubernetes Operators using the recently released new Operator SDK and move human operational knowledge into code and we have open sourced quite a few operators already. This blog will dive deep into the PVC Operator.

If you are looking for a complete guide how to use the Operator SDK or just interested in Kubernetes Operators, please check our comprehensive guide.

If you are interested in our other Operators read our earlier blogs:

Prometheus JMX Exporter Operator
Wildfly Operator
Vault Operator

Introducing the PVC Operator

Persistent Volume handling in Kubernetes can become messy especially when the Kubernetes cluster is created in one of the managed cloud environments.

Don’t know what the heck is Kubernetes Persistent Volume and StorageClasses? No worries we already described it in another blogpost.

Managed Kubernetes providers like Azure or Google creates a default StorageClass, but what happens if this does not meet your requirements. There are two options:

  • Create Helm charts which are cloud provider specific.
  • Use the Banzai Cloud PVC Operator which handles the StorageClass creation for your requirements.

How the PVC Operator does its magic?

Determine the cloud provider

To be cloud agnostic, the operator needs to determine the cloud provider. To do that it uses a metadata server which is available inside all providers. This server provides not just the origin of the cluster but other important informations which are required to create for example a Storage Account in Azure. Metadata server access slightly differs on every cloud.

Create StorageClass for the needs

Operator parses the submitted Persistent Volume Claim, if it does not contain any spec.storageClassName the operator will simply ignore this request and the default one will be used. On the other hand if that field is set, it will determine the right volume provisioner and creates the appropriate StorageClass.

To fully understand how does it work, let’s walk through an example:

Imagine that we want create an Application (Tensorflow) which requires a ReadWriteMany volume and the selected provider is Azure. So we installed the PVC Operator from Banzai Cloud and submitted the Persistent Volume Claim. The operator determines the cloud provider and figures out the perfect storage provider is AzureFile. Creating an AzureFile backed StorageClass requires a Storage Account inside Azure within the same resource group, and some meta information eg.:(subscriptionId, location). All of this is taken care by the Operator on the fly.

For supported storage providers please check the GitHub page of the project.

Few features worth mentioning

NFS as storage provisioner

NFS stands for Network File System. It allows to access files over computer network. This project allows to use it inside Kubernetes. PVC Operator uses this as well to create a NFS backed StorageClass.

For NFS provisioner the operator needs to create an NFS server deployment and service which handles the traffic. This deployment has one cloud provider backed ReadWriteOnly volume which is distributed to other entities by the server, so it is usable as a ReadWriteMany volume. This came in handy when the cloud provisioned ReadWriteMany volumes are slow. For example the Pipeline Platform has a pretty advanced CI/CD system embedded and a system as such is dealing with lot of small files generated during a git clone or maven build.

To request the NFS backed StorageClass, please use StorageClass names which contains nfs.

Create Object Store Bucket

You may wonder whether this operator registers a Custom Resource? Well actually it does, and a CRD is used to create Object Store Buckets on different cloud providers. Only Google is supported now but we are working on adding support for all the other major providers.

To create a bucket submit the following Custom Resource:

apiVersion: "banzaicloud.com/v1alpha1"
kind: "ObjectStore"
metadata:
  name: "test"
spec:
  name: "googlebucket"

PVC Operator flow

Try it out

To try out we are going to use a Spark Streaming application from this blog. This application requires a persistent volume, which will be created by the PVC Operator. Also we are going to install the Spark History Server as well which requires a bucket. It also will be created by the Operator.

We will not cover every details how to run this Spark application because it is covered thoroughly in the blog mentioned above but also focus on how the operator ease the application submission.

If you don’t have a Kubernetes cluster please create one, if you are looking for a painless solution use Pipeline, the next generation platform with focus on applications.

Use the kubectl to create the PVC Operator:

kubectl create -f deploy/crd.yaml
customresourcedefinition "objectstores.banzaicloud.com" created
kubectl create -f deploy/operator.yaml
deployment "pvc-operator" created

Now create a bucket for the Spark History Server:

kubectl create -f deploy/cr.yaml
objectstore "sparkhistory" created

If you follow the log of the pvc-operator:

kubectl logs pvc-operator-cff45bbdd-cqzhx
level=info msg="Go Version: go1.10"
level=info msg="Go OS/Arch: linux/amd64"
level=info msg="operator-sdk Version: 0.0.5+git"
level=info msg="starting persistentvolumeclaims controller"
level=info msg="starting objectstores controller"
level=info msg="Object Store creation event received!"
level=info msg="Check of the bucket already exists!"
level=info msg="Creating new storage client"
level=info msg="Storage client created successfully"
level=info msg="Getting ProjectID from Metadata service"
level=info msg="banzaicloudsparkhistory bucket created"

Create all Spark related requirements:

  • ResourceStaging Server
  • Shuffle Service
  • History Server

Configure History Server properly to point to the bucket created above in our case it is:

{
  "name": "banzaicloud-stable/spark-hs",
  "values": {
  	"app": {
			"logDirectory": "gs://banzaicloudsparkhistory"
	}
  }
}
  • Build the NetworkWordCount example
  • Don’t forget to port forward the RSS server

Then launch the Spark application:

bin/spark-submit --verbose \
  --deploy-mode cluster \
  --class com.banzaicloud.SparkNetworkWordCount \
  --master k8s://<your kubernetes master ip> \
  --kubernetes-namespace default \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.app.name=NetworkWordCount \
  --conf spark.kubernetes.driver.docker.image=banzaicloud/spark-driver:pvc-operator-blog \
  --conf spark.kubernetes.executor.docker.image=banzaicloud/spark-executor:pvc-operator-blog \
  --conf spark.kubernetes.initcontainer.docker.image=banzaicloud/spark-init:pvc-operator-blog \
  --conf spark.kubernetes.checkpointdir.enable=true \
  --conf spark.kubernetes.checkpointdir.storageclass.name=checkpointdirsc \
  --conf spark.driver.cores="300m" \
  --conf spark.executor.instances=2 \
  --conf spark.kubernetes.shuffle.namespace=default \
  --conf spark.kubernetes.resourceStagingServer.uri=http://localhost:31000 \
  --conf spark.kubernetes.resourceStagingServer.internal.uri=http://spark-rss:10000 \
  --conf spark.kubernetes.authenticate.submission.caCertFile=<your ca data path> \
  --conf spark.kubernetes.authenticate.submission.clientCertFile=<your client cert path> \
  --conf spark.kubernetes.authenticate.submission.clientKeyFile=<>your client key path> \
  --conf spark.eventLog.enabled=true \
  --conf spark.eventLog.dir=gs://banzaicloudsparkhistory \
  --conf spark.local.dir=/tmp/spark-local \
  file:///<your path to word count example>/spark-network-word-count-1.0-SNAPSHOT.jar tcp://0.tcp.ngrok.io <your choosen ngrok port> file:///checkpointdir

If we check the StorageClasses the Operator already created one for Spark and the PVC is bound as well:

kubectl get storageclass
NAME                 PROVISIONER            AGE
sparkcheckpoint      kubernetes.io/gce-pd   8m
kubectl get pvc
NAME                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
spark-checkpoint-dir   Bound     pvc-a069a1c6-5a0f-11e8-b71f-42010a840053   1Gi        RWO            sparkcheckpoint   6m

If we check the Spark drivers log, it is clear that it put logs to the bucket created above with the Operator:

INFO  KubernetesClusterSchedulerBackend:54 - Requesting a new executor, total executors is now 1
INFO  KubernetesClusterSchedulerBackend:54 - Requesting a new executor, total executors is now 2
INFO  EventLoggingListener:54 - Logging events to gs://banzaicloudsparkhistory/spark-03dc1b39d1df4d53895c490a16998698

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus