Run Spark 3.0.0 on Kubernetes
A step by step guide on running your local spark scala code on Kubernetes using spark-on-k8s-operator.
In my earlier post, I package the application jar inside the docker image. For local development, it would be easier if we can run the code on K8 without rebuilding the image each time.
One way of doing that would be via sharing your local development path as a Volume. In this post, we will see how to do that
Versions:
- Spark: 3.0.0
- Scala: 2.12
- SBT: 1.3.13
- Docker On Mac: 2.2.0.0
- Kubernetes: v1.15.5
- spark-on-k8s-operator: sparkoperator.k8s.io/v1beta2
Step 1: Set up Kubernetes:
Please follow my earlier post to set up Kubernetes
Step 2: The Project can be cloned from Github:
The project structure is as below:
Step 3: Set up for packaging:
- The plugins.sbt file inside the project folder is required for building a flat jar.
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
2. The build.properties file inside the project folder is required for providing the sbt version.
sbt.version=1.3.13
3. In build.sbt the task for merge strategy is required to resolve any conflicts during packaging
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case x => MergeStrategy.first
}
Step 4: Create yaml file to deploy/run the code on K8
The file is available in examples/spark-scala-k8-app.yaml
Important elements in the yaml file
- spec.image: “gcr.io/spark-operator/spark:v3.0.0”. It’s a static image
- spec.imagePullPolicy: Never . Since we are pulling image from local.
- spec.mainClass: com.AppK8Demo . Provide the path to your job class. AppK8Demo is a sample Job class from my Github repo.
- Update the spec.volumes as per your work directory.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-scala-file-k8-app
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.0.0"
imagePullPolicy: Never
mainClass: com.AppK8Demo
mainApplicationFile: "local:///opt/spark/work-dir/jar/spark-scala-k8-app-assembly-0.1.jar"
sparkVersion: "3.0.0"
restartPolicy:
type: Never
volumes:
- name: "app-volume"
hostPath:
path: "/Users/sugupta/Desktop/codebase/personal/spark-scala-k8-app/target/scala-2.12"
type: Directory
- name: "file-volume"
hostPath:
path: "/Users/sugupta/Desktop/codebase/personal/spark-scala-k8-app/src/main/resources"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
serviceAccount: spark
volumeMounts:
- name: "app-volume"
mountPath: "/opt/spark/work-dir/jar"
- name: "file-volume"
mountPath: "/opt/spark/work-dir/src/main/resources"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
volumeMounts:
- name: "app-volume"
mountPath: "/opt/spark/work-dir/jar"
- name: "file-volume"
mountPath: "/opt/spark/work-dir/src/main/resources"
Run the job
kubectl apply -f examples/spark-scala-k8-app.yaml
Hoped this worked for you too