The following post describes capabilities for the Deep Learning Toolkit (DLTK) a Splunk Works solution available on Splunkbase. Splunk is expanding its ML portfolio with new, tightly integrated ML capabilities including Streaming ML and Splunk Machine Learning Environment (SMLE). To learn more about the direction of Splunk’s ML portfolio, checkout Lila Fridley’s blog, Machine Learning Guide: Choosing the Right Workflow.
The Splunk Deep Learning Toolkit (DLTK) is a very powerful tool that allows you to offload compute resources to external container environments. Additionally, you can use GPU or SPARK environments. In last Splunk blog post, The Power of Deep Learning Analytics and GPU Acceleration, you can learn more about building a GPU-based environment.
Splunk DLTK supports Docker as well as Kubernetes and OpenShift as container environments. In this article, we will go through the setup for using DLTK 3.3 and Amazon EKS as a kubernetes environment.
To manage EKS and Kubernetes, you first need to install some CLI tools on your laptop. Please refer to this document for additional details on getting started.
Note: To manage EKS, the IAM user must have AmazonEKSClusterPolicy.
Also, please install Splunk DeepLearning Toolkit beforehand. This blog is targeted to DLTK 3.x.
Let's take a look at the set up flow after this. In Amazon EKS, Fargate and Managed Node are available as Computer Nodes, but this time we are using Managed Node. Also, the storage service must support ReadWriteMany, so we used EFS this time. By the way, the default gp2 can be used in DLTK 4.0.
First, create an EKS cluster. See here for details.
$ eksctl create cluster \ --name <> \ --nodegroup-name <> \ --region <> \ --node-type <> \ --nodes <<1>> \ --ssh-access \ --ssh-public-key <> \ --managed
In this time, we use the t3.medium instance type and one node for verification purposes. You can customize the other items as needed.It will take a while to create a cluster and node group.
Let's check if it has been created successfully.
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 443/TCP 14d
$ kubectl get node NAME STATUS ROLES AGE VERSION ip-192-168-81-176.us-east-2.compute.internal Ready 9d v1.18.9-eks-d1db3c
Splunk DLTK 3.x uses volumes with "ReadWriteMany" for storage, so we have to use EFS service.
For more information on setup, please refer to this document and proceed.
$ kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr/?ref=release-1.0"
A. Get the Cluster's CIDR information
Locate the VPC ID for your Amazon EKS cluster. You can find this ID in the Amazon EKS console, or you can use the following AWS CLI command.
$ aws eks describe-cluster --name --query "cluster.resourcesVpcConfig.vpcId" --output text
Locate the CIDR range for your cluster's VPC. You can find this in the Amazon VPC console, or you can use the following AWS CLI command.
You'll use this CIDR information at the next step.
B. Create a new security group to allow NFS access.
Create a security group that allows inbound NFS traffic for your Amazon EFS mount points.
C. Create the Amazon EFS file system for your Amazon EKS cluster.
D. Create Access Point
By Default, only root users can access this file system, so the DLTK cluster will fail to deploy the container. You should create a new access point for it.
StorageClass
Copy and create this yaml file to your local laptop.
storageclass.yaml
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: <> provisioner: efs.csi.aws.com allowVolumeExpansion: true
Deploy this storageclass to your cluster.
$ kubectl apply -f storageclass.yaml
Verify the deployment.
$ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE efs-sc efs.csi.aws.com Delete Immediate true 14d gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 14d
Persistent Volume
Copy and create this yaml file to your local laptop.
pv.yaml
apiVersion: v1 kind: PersistentVolume metadata: name: <> spec: capacity: storage: 20Gi volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Delete storageClassName: efs-sc csi: driver: efs.csi.aws.com volumeHandle: <>::<>
Change the name and volumeHandle ("fs-xxxxx" and "fsap-xxxxxxxx") for your environment. Check your EFS configuration on your AWS console.
Deploy this persistent volume to your cluster.
$ kubectl apply -f pv.yaml
Verify the deployment.
$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE dltk-efs-volume 20Gi RWX Delete Available default/dltk efs-sc 25h
DLTK 3.x supports Load Balancer or Node Port as Ingress type for kubernetes. At this time, I use Node Port as an Ingress type.
Add this Node Port range for your Security Group.
30000-32767: Node Port
This step is optional and you may skip it if you would like. If you skip this step, use default namespace for DLTK.
1. Create a new YAML file called my-namespace.yaml with the contents:
my-namespace.yamla
kind: Namespace metadata: name: <>
Change the namespace name <> as you like.
Then run:
$ kubectl apply -f ./my-namespace.yaml
2. Verify your namespace. dltk is my new namespace.
$ kubectl get namespaces NAME STATUS AGE default Active 15d dltk Active 33h kube-node-lease Active 15d kube-public Active 15d kube-system Active 15d
Go to Configuration --> Setup on DLTK App.
Go to Containers. Choose kubernetes on Cluster target. And Start!
If you have met any errors for set up, use this command for troubleshooting.
$ kubectl get deployments --namespace=dltk NAME READY UP-TO-DATE AVAILABLE AGE dev 1/1 1 1 30h $ kubectl describe deployment dev --namespace=dltk << More detail Information>>
$ kubectl get pods --namespace=dltk NAME READY STATUS RESTARTS AGE dev-7f9cdcc6d7-mzcdb 1/1 Running 0 30h $ kubectl describe pod <> --namespace=dltk << More detail Information>>
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dltk Bound dltk-efs-volume1 20Gi RWX efs-sc 34h $ kubectl describe pvc <> --namespace=dltk << More detail Information>>
$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE dltk-efs-volume1 20Gi RWX Delete Bound dltk/dltk efs-sc 34h $ kubectl describe pv <>
$ kubectl logs -f <> --namespace=dltk
Furthermore, you can monitor Amazon EKS using Splunk Infrastructure Monitoring (formerly Signal FX) to monitor the learning load in real-time.
We will not go into the set up of this one. Please refer to the setup guide here.
Once you complete setting up the DLTK with an EKS environment, you can easily extend and retract the computer resources. Furthermore, multiple DLTKs can share this EKS to optimize resources.
Today, we introduced the set up flow for development and testing purposes. If you need to run this for production, you can talk with your local Splunk engineers.
Finally, I would like to thank Philipp Drieger for his advice and support in writing this blog.
To learn more about all of Splunk’s ML offerings, head over to Machine Learning Guide: Choosing the Right Workflow, and look for more blog posts coming soon.
----------------------------------------------------
Thanks!
Junichi Maruyama
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.