Spark with Prometheus monitoring

Suchit Gupta
4 min readFeb 25, 2021

--

Get spark jobs running in Kubernetes with Prometheus monitoring.

A step by step guide to monitor spark jobs running in K8 via Prometheus

Set up the Prometheus server using kube-prometheus-stack helm chart on Docker-On-Mac

Please follow my post to set up the Prometheus server on Docker-On-Mac

Inspect the Prometheus resource created by the kube-prometheus-stack helm chart

A Prometheus, defines a desired state of Prometheus deployment.

The Prometheus created by kube-prometheus-stack is looking for a service monitor objects defined with label: `release: prometheus`

Lets’ check the Prometheus resource. Execute below command:

kubectl get prometheus -oyaml

The yaml file looks like below and check for `serviceMonitorSelector`

apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: default
creationTimestamp: "2021-02-24T20:12:37Z"
generation: 1
labels:
app: kube-prometheus-stack-prometheus
app.kubernetes.io/managed-by: Helm
chart: kube-prometheus-stack-13.10.0
heritage: Helm
release: prometheus
managedFields:
- apiVersion: monitoring.coreos.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:spec:
.: {}
f:alerting:
.: {}
f:alertmanagers: {}
f:enableAdminAPI: {}
f:externalUrl: {}
f:image: {}
f:listenLocal: {}
f:logFormat: {}
f:logLevel: {}
f:paused: {}
f:podMonitorNamespaceSelector: {}
f:podMonitorSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:portName: {}
f:probeNamespaceSelector: {}
f:probeSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:replicas: {}
f:retention: {}
f:routePrefix: {}
f:ruleNamespaceSelector: {}
f:ruleSelector:
.: {}
f:matchLabels:
.: {}
f:app: {}
f:release: {}
f:securityContext:
.: {}
f:fsGroup: {}
f:runAsGroup: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:serviceAccountName: {}
f:serviceMonitorNamespaceSelector: {}
f:serviceMonitorSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:shards: {}
f:version: {}
manager: Go-http-client
operation: Update
time: "2021-02-24T20:12:37Z"
name: prometheus-kube-prometheus-prometheus
namespace: default
resourceVersion: "1383"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/default/prometheuses/prometheus-kube-prometheus-prometheus
uid: 1315fc95-3465-45a0-bcbf-1ce924252925
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: prometheus-kube-prometheus-alertmanager
namespace: default
pathPrefix: /
port: web
enableAdminAPI: false
externalUrl: http://prometheus-kube-prometheus-prometheus.default:9090
image: quay.io/prometheus/prometheus:v2.24.0
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus
portName: web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: prometheus
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-kube-prometheus-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus

shards: 1
version: v2.24.0
kind: List
metadata:
resourceVersion: ""
selfLink: ""

Create a Prometheus Service Monitor that is listening to a Service with a specific label

ServiceMonitor, which declaratively specifies how groups of Kubernetes services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server.

The Prometheus resource is looking for a service monitor resources with label `release: prometheus` so we need to ensure that this label is defined in our service monitor resource.

Let’s use the below yaml file to create a service monitoring resource that is listening to services defined with label `app: spark`

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: servicemonitor-spark
labels:
release: prometheus
spec:
selector:
matchLabels:
app: spark
endpoints:
- port: metrics

Save the file as servicemonitor-spark.yaml and save it inside a folder name prometheus and execute following:

kubectl apply -f prometheus/servicemonitor-spark.yaml

Check the object via

kubectl get servicemonitor

Create a service resource listening to pods with a specific label

The service monitor resource is looking for a service resources with label `app: spark` so we need to ensure that this label is defined in our service resource.

The service resource looks for the pods with label `app: spark` and the pods should be exposing their metrics at port 8090

Save the content below in a file say spark-service.yaml and save it inside a folder prometheus.

apiVersion: v1
kind: Service
metadata:
name: spark-service
labels:
app: spark
spec:
ports:
- name: metrics
port: 8090
targetPort: 8090
protocol: TCP
selector:
app: spark

And execute

kubectl apply -f prometheus/spark-service.yaml

Run a sample Spark Job in K8

Please go through my earlier post to set up the spark-k8-operator

Let’s use the below yam file to deploy a sample Spark job.

The driver and executor defines the label `app: spark` which is required by the service resource.

Save the file with name spark-sample-prometheus.yaml, inside a folder name prometheus.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
name: spark-pi-scheduled
namespace: default
spec:
schedule: "@every 1m"
concurrencyPolicy: Allow
template:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.0.0-gcs-prometheus"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar"
sparkVersion: "3.0.0"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
serviceAccount: spark
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8090

Execute the below command to deploy the spark job:

kubectl apply -f prometheus/spark-sample-prometheus.yaml

The metrics would be accessible at port 8090

View the metrics in prometheus server

--

--

Responses (1)