Package Spark Scala Code and Deploy it on Kubernetes using Spark-on-k8s-Operator

Suchit Gupta

3 min readMay 5, 2020

A step by step guide on packaging your spark scala code and deploying it on Kubernetes using spark-on-k8s-operator.

Versions:

Spark: 3.0.0
Scala: 2.12
SBT: 1.3.13
Docker On Mac: 2.2.0.0
Kubernetes: v1.15.5
spark-on-k8s-operator: sparkoperator.k8s.io/v1beta2

Step 1: Set up Kubernetes:

Please follow my earlier post to set up Kubernetes

Run Spark Job on Kubernetes using Spark-on-k8s-Operator

A step by step guide for running spark jobs on Kubernetes using spark-on-k8s-operator

medium.com

Step 2: The Project can be cloned from Github:

The project structure is as below:

Step 3: Set up for packaging:

The plugins.sbt file inside the project folder is required for building a flat jar.

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")

2. The build.properties file inside the project folder is required for providing the sbt version.

sbt.version=1.3.13

3. In build.sbt the task for merge strategy is required to resolve any conflicts during packaging

assemblyMergeStrategy in assembly := {
  case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
  case PathList("META-INF", xs@_*) => MergeStrategy.discard
  case "application.conf" => MergeStrategy.concat
  case x => MergeStrategy.first
}

Step 4: Build the base docker image which has Hadoop, Spark, and SBT

This docker image would provide the required env for executing your application code.

Docker file:

ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.0.0
FROM ${SPARK_IMAGE}

ENV SBT_VERSION 1.3.13


# Switch to user root so we can add additional jars, packages and configuration files.
USER root

RUN apt-get -y update && apt-get install -y curl

USER ${spark_uid}

WORKDIR /app

#Install SBT
RUN curl -fsL https://github.com/sbt/sbt/releases/download/v$SBT_VERSION/sbt-$SBT_VERSION.tgz | tar xfz - -C /usr/local
ENV PATH /usr/local/sbt/bin:${PATH}

RUN sbt update

ENTRYPOINT ["/opt/entrypoint.sh"]

Create a docker image by executing:

docker build -t test/spark-operator:latest .

Step 5: Build the docker image with the flat jar

The docker image is built from the image that we created in Step 4. It creates a flat jar for your code.

Docker file:

FROM test/spark-operator:latest

# Add project files
ADD build.sbt /app/
ADD project/plugins.sbt /app/project/
ADD project/build.properties /app/project/
ADD src/. /app/src/


#Build the projects
RUN sbt clean assembly

ENTRYPOINT ["/opt/entrypoint.sh"]

Create a docker image by executing:

docker build -f Dockerfile-app -t test/spark-scala-k8-app:latest .

Step 6: Create yaml file to deploy/run the code on K8

The file is available in examples/spark-scala-k8-app.yaml

Important elements in the yaml file

spec.image: “test/spark-scala-k8-app:latest .Provide the image name from Step 5
spec.imagePullPolicy: Never . Since we are pulling image from local.
spec.mainClass: com.AppK8Demo . Provide the path to your job class. AppK8Demo is a sample Job class from my Github repo.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-scala-k8-app
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "test/spark-scala-k8-app:latest"
  imagePullPolicy: Never
  mainClass: com.AppK8Demo
  mainApplicationFile: "local:///app/target/scala-2.12/spark-scala-k8-app-assembly-0.1.jar"
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Execute to deploy/run the job

kubectl apply -f examples/spark-scala-k8-app.yaml

Congratulations! Hoped this worked for you too

Package Spark Scala Code and Deploy it on Kubernetes using Spark-on-k8s-Operator

Run Spark Job on Kubernetes using Spark-on-k8s-Operator

A step by step guide for running spark jobs on Kubernetes using spark-on-k8s-operator

Written by Suchit Gupta

No responses yet