Placeholder image

Sandor Magyari

Mon, Jan 15, 2018


Amazon Elastic File System on Kubernetes

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin and most recently Tensorflow, all running on our Pipeline PaaS (built on Kubernetes).

One of Pipeline’s early adopter is running a Tensorflow Training Controller using GPUs on AWS EC2 wired into our CI/CD pipeline and needed significant parallelization for reading training data. We have introduced support for Amazon Elastic File System and will make it publicly available in the forthcoming release of Pipeline. Beside Tensorflow they also use EFS for Spark Streaming checkpointing - instead of S3 (note that we don’t use HDFS at all).

This post would like to walk you through the gotchas of EFS on Kubernetes and give you a clear idea about the benefits, before we dig into the Tensorflow and Spark Streaming examples in the next related posts, hence:

  • At the end of the blog you will understand how EFS works on k8s
  • You will be able to provision and use EFS with or without Pipeline
  • Appreciate the simplicity and automation of what Pipeline does

Notes on EFS

  • One app, multiple nodes, same file(s) - EFS is your best friend
  • But wait, I rather use S3. No, S3 could neither be an alternative to an NFS nor a replacement for EFS. S3 is not a file system.
  • This smells like a cloud lockin’ to me - not really, Pipeline/Kubernetes can use minio to unlock you. Another post …
  • Can cost a hack of a lot - it’s SSD based and the storage capacity and pricing will scale in or out, though it’s still like 10x more than EBS
  • It’s based on NFSv4.x and can be used with AWS Direct Connect (yes, one colleague already had weird ideas with the federation work we are doing with Pipeline for hybrid-deployments)
  • You can use GlusterFS instead if you are one of those superheroes

Create and attach EFS storage manually to a k8s cluster - the hard way

tl;dr1: I don’t care, Pipeline automates all this, maybe I’ll give you a GitHub star and done with the reading

tl;dr2: I know all these already, I’ll just use the EFS provisioner deployment you guys open sourced. Done. Maybe I’ll give you a GitHub star as well

Create a Kubernetes cluster on AWS

OK, the first step is actually it’s not hard - you can provision a Kubernetes cluster with Pipeline with one single REST API call - see the Postman collection we have created for that or follow this post or install & launch Pipeline either yourself or launch a Pipeline control plane on AWS with the following Cloudformation template. Easy, isn’t it - with plenty of options - just wait and see the coming hosted service.

Once the cluster is up and running you can use the Cluster Info request from Pipeline’s Postman collection to gather required infos about cluster. You will need the following:

  • Ip address of nodes, except master
  • VPC id, Subnet id and Security Group id for cluster nodes network (VpcId, SubnetId, SecurityGroupId)

The following steps will require AWS CLI, the easiest is to just ssh into one of the nodes as there is already an AWS CLI installed and ready to be used. You will need the same SSH key you have provided for Pipeline.

ssh -i yourPrivateKey ubuntu@[Node-Public-Ip]

In case you’re using Pipeline control plane, you had to ssh to control plane instance first, there you’ll find the ssh key for rest of the nodes at: /opt/pipeline/.ssh/id_rsa.

Configure the AWS client with aws configure specifying AWS region and credentials.

Create the EFS FileSystem

You will need a unique ID for file system, install uuid if necessary:

sudo apt install uuid

Create the FileSystem:

aws efs create-file-system --creation-token $(uuid)
{
    "SizeInBytes": {
        "Value": 0
    },
    "CreationToken": "dfa3efaa-e2f7-11e7-b6r3-1b3492c170e5",
    "Encrypted": false,
    "CreationTime": 1515793944.0,
    "PerformanceMode": "generalPurpose",
    "FileSystemId": "fs-c1f34a18",
    "NumberOfMountTargets": 0,
    "LifeCycleState": "creating",
    "OwnerId": "1234567890"
}

Remember FileSystemId and OwnerId as you will need it later.

Create the mount target

aws efs create-mount-target \
        --file-system-id {FileSystemId} \
        --subnet-id {SubnetId} \
        --security-groups {SecurityGroupId}
        {
            "MountTargetId": "fsmt-5dfa3054",
            "NetworkInterfaceId": "eni-5cfa2372",
            "FileSystemId": "fs-c1f65a08",
            "LifeCycleState": "creating",
            "SubnetId": "subnet-1d11267a",
            "OwnerId": "1234567890",
            "IpAddress": "10.0.100.195"
        }

You have to poll the status of mount targets until status LifeCycleState = “available”:

aws efs describe-mount-targets --file-system-id fs-c1f24a08

Create an inbound rule for NFS on the security group

aws ec2 authorize-security-group-ingress --group-id {SecurityGroupId} --protocol tcp \
      --port 2049 --source-group {SecurityGroupId} --group-owner {OwnerId}

Deploy the EFS provisioner

In order to mount EFS storage as PersistentVolumes in Kubernetes, you will need to deploy the EFS provisioner which consists of a container that has access to an AWS EFS resource. To deploy to the Kubernetes cluster directly from your machine you need to download the Kubernetes cluster config (aka kubeconfig), using the Cluster Config request from Postman collection. The easiest is to save to a local file to your home directory and set KUBECONFIG env variable:

export KUBECONFIG=~/.kube/config

Make sure your Amazon images contains nfs-common package, if not SSH to all nodes and Install nfs-common with sudo apt-get install nfs-common

wget https://raw.githubusercontent.com/banzaicloud/banzai-charts/master/efs-provisioner/efs-provisioner.yaml

Edit efs-provisioner.yaml and replace the following values within brackets: {FILE_SYSTEM_ID}, {AWS_REGION}, {AWS_ACCESS_KEY_ID}, {AWS_SECRET_ACCESS_KEY} with yours.

Alternatively instead of specifying ASW credentials you can also setup instance profile roles to allow EFS access. Apply it with kubectl.

kubectl apply -f efs-provisioner.yaml

The output should be something like this.

configmap "efs-provisioner" created
clusterrole "efs-provisioner-runner" created
clusterrolebinding "run-efs-provisioner" created
serviceaccount "efs-provisioner" created
deployment "efs-provisioner" created
storageclass "aws-efs" created
persistentvolumeclaim "efs" created

At this point your EFS PVC should be ready to use:

kubectl get pvc

NAME      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
efs       Bound     pvc-e7f86c81-f7ea-11e7-9914-0223c9890f2a   1Gi        RWX            aws-efs        29s

Mount the PVC to a container

Finally lets see how you can use the PVC you just claimed and mount it to a container.

apiVersion: v1
kind: Pod
metadata:
  name: example-app
spec:
  containers:
  - name: example-app
    image: example_image:v0.1
    volumeMounts:
        - name: efs-pvc
          mountPath: "/efs-volume"
  volumes:
    - name: efs-pvc
      persistentVolumeClaim:
        claimName: efs

At this stage you must be a happy user of EFS - a bit of work to be done to get to this stage but worry not, the next post is about how you can do it with Pipeline. That will fit in a Twitter message, though. Also we will walk thorugh the benefits of using EFS with Tensorflow and the performance improvements EFS gives for a streaming Spark application regarding checkpointing (and the reasons we switch to EFS instead of S3 or HDFS).

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus