Spark with Prometheus monitoring
Get spark jobs running in Kubernetes with Prometheus monitoring.
A step by step guide to monitor spark jobs running in K8 via Prometheus
Set up the Prometheus server using kube-prometheus-stack helm chart on Docker-On-Mac
Please follow my post to set up the Prometheus server on Docker-On-Mac
Inspect the Prometheus resource created by the kube-prometheus-stack helm chart
A Prometheus
, defines a desired state of Prometheus deployment.
The Prometheus
created by kube-prometheus-stack is looking for a service monitor objects defined with label: `release: prometheus`
Lets’ check the Prometheus
resource. Execute below command:
kubectl get prometheus -oyaml
The yaml file looks like below and check for `serviceMonitorSelector`
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: default
creationTimestamp: "2021-02-24T20:12:37Z"
generation: 1
labels:
app: kube-prometheus-stack-prometheus
app.kubernetes.io/managed-by: Helm
chart: kube-prometheus-stack-13.10.0
heritage: Helm
release: prometheus
managedFields:
- apiVersion: monitoring.coreos.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:spec:
.: {}
f:alerting:
.: {}
f:alertmanagers: {}
f:enableAdminAPI: {}
f:externalUrl: {}
f:image: {}
f:listenLocal: {}
f:logFormat: {}
f:logLevel: {}
f:paused: {}
f:podMonitorNamespaceSelector: {}
f:podMonitorSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:portName: {}
f:probeNamespaceSelector: {}
f:probeSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:replicas: {}
f:retention: {}
f:routePrefix: {}
f:ruleNamespaceSelector: {}
f:ruleSelector:
.: {}
f:matchLabels:
.: {}
f:app: {}
f:release: {}
f:securityContext:
.: {}
f:fsGroup: {}
f:runAsGroup: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:serviceAccountName: {}
f:serviceMonitorNamespaceSelector: {}
f:serviceMonitorSelector:
.: {}
f:matchLabels:
.: {}
f:release: {}
f:shards: {}
f:version: {}
manager: Go-http-client
operation: Update
time: "2021-02-24T20:12:37Z"
name: prometheus-kube-prometheus-prometheus
namespace: default
resourceVersion: "1383"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/default/prometheuses/prometheus-kube-prometheus-prometheus
uid: 1315fc95-3465-45a0-bcbf-1ce924252925
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: prometheus-kube-prometheus-alertmanager
namespace: default
pathPrefix: /
port: web
enableAdminAPI: false
externalUrl: http://prometheus-kube-prometheus-prometheus.default:9090
image: quay.io/prometheus/prometheus:v2.24.0
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus
portName: web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: prometheus
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-kube-prometheus-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus
shards: 1
version: v2.24.0
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Create a Prometheus Service Monitor that is listening to a Service with a specific label
ServiceMonitor
, which declaratively specifies how groups of Kubernetes services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server.
The Prometheus
resource is looking for a service monitor resources with label `release: prometheus` so we need to ensure that this label is defined in our service monitor resource.
Let’s use the below yaml file to create a service monitoring resource that is listening to services
defined with label `app: spark`
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: servicemonitor-spark
labels:
release: prometheus
spec:
selector:
matchLabels:
app: spark
endpoints:
- port: metrics
Save the file as servicemonitor-spark.yaml and save it inside a folder name prometheus and execute following:
kubectl apply -f prometheus/servicemonitor-spark.yaml
Check the object via
kubectl get servicemonitor
Create a service resource listening to pods with a specific label
The service monitor
resource is looking for a service resources with label `app: spark` so we need to ensure that this label is defined in our service resource.
The service
resource looks for the pods with label `app: spark` and the pods should be exposing their metrics at port 8090
Save the content below in a file say spark-service.yaml and save it inside a folder prometheus.
apiVersion: v1
kind: Service
metadata:
name: spark-service
labels:
app: spark
spec:
ports:
- name: metrics
port: 8090
targetPort: 8090
protocol: TCP
selector:
app: spark
And execute
kubectl apply -f prometheus/spark-service.yaml
Run a sample Spark Job in K8
Please go through my earlier post to set up the spark-k8-operator
Let’s use the below yam file to deploy a sample Spark job.
The driver and executor defines the label `app: spark` which is required by the service
resource.
Save the file with name spark-sample-prometheus.yaml, inside a folder name prometheus.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
name: spark-pi-scheduled
namespace: default
spec:
schedule: "@every 1m"
concurrencyPolicy: Allow
template:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.0.0-gcs-prometheus"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar"
sparkVersion: "3.0.0"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
serviceAccount: spark
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8090
Execute the below command to deploy the spark job:
kubectl apply -f prometheus/spark-sample-prometheus.yaml
The metrics would be accessible at port 8090
View the metrics in prometheus server