kubernetes init container for spark-submit

120

Question: kubernetes init container for spark-submit

I am trying to run a spark-submit to the Kubernetes cluster with spark 3.2.1 image and it is working. Now my question is, can I execute an init container along with the spark-submit? What am trying to achieve is that the init container check another service apache-spark error is up or not, is it up then spark-submit will run or it fail.

I can see that a conf parameter "spark.kubernetes.initContainer.image" for spark version 2.3 but not for 3.2.1 (https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html)

is there any mechanism that I can use to check other services apache-spark error are up or not before I submit a spark job?

I can see init container usage for the spark in the below links apache-spark error but it is not providing an accurate answer

https://docs.bitnami.com/kubernetes/infrastructure/spark/configuration/configure-sidecar-init-containers/ https://doc.lucidworks.com/spark-guide/11153/running-spark-on-kubernetes

any help will be much appreciated, thanks.

Total Answers: 3

30

Answers 1: of kubernetes init container for spark-submit

I found that the best way to submit a spark job is the sparkoperator, more details can be found in the GitHub link

There is one option to include an init container and a sidecar container.

83

Answers 2: of kubernetes init container for spark-submit

You don't mention if the other service is in the same container or not but the principles are the same. It's covered in the docs here and gives this example which defines a simple Pod that has two init containers. The first waits for myservice, and the second waits for mydb. Once both init containers complete, the Pod runs the app container from its spec section.

apiVersion: v1 kind: Pod metadata:   name: myapp-pod   labels:     app: myapp spec:   containers:   - name: myapp-container     image: busybox:1.28     command: ['sh', '-c', 'echo The app is running! && sleep 3600']   initContainers:   - name: init-myservice     image: busybox:1.28     command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]   - name: init-mydb     image: busybox:1.28     command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"] 
73

Answers 3: of kubernetes init container for spark-submit

You can define a pod template for your pod

 ./bin/spark-submit --master k8s://50.1.0.4:6443 --deploy-mode cluster --name spark-pi --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:v3.2.1  --conf spark.kubernetes.driver.podTemplateFile=//path/my_pod_template.yaml --conf spark.kubernetes.executor.podTemplateFile=//path/my_pod_template.yaml --conf local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar  

Note, that a template doesn't have to contain all necessary fields for Spark app to function. It's main purpose, as described in the official docs is to:

Spark users can similarly use template files to define the driver apache-spark error or executor pod configurations that Spark configurations do not support.

That means that a lot/most fields will be overridden based on --conf values. In my case I didn't want to specify the main container spec, I only needed the initContainer to make some init checks. Needless to say, all volumes and env vars which are available to the main container will also be available to the init container without explicitly adding them to the pod template.

my_pod_template.yaml:
something like in Alan's answer

spec:   containers:   - name: myapp-container     image: busybox:1.28     command: ['sh', '-c', 'echo The app is running! && sleep 3600']   initContainers:   - name: init-myservice     image: busybox:1.28     command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]   - name: init-mydb     image: busybox:1.28     command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"] 

source: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template