[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

npgretz · 2024-11-07T15:14:23Z

What feature you would like to be added?

I would like deployed Spark Apps to be able to be run in readOnlyRootFilesystem. Currently, they fail when no longer able to write to multiple places inside the container. These restrictions are due to how Spark works and the Apache Spark image.

While we could request that Apache Spark create a Kubernetes specific Docker image that allows readOnlyRootFilesystem, I think a quicker approach is to have several workarounds that have allowed me to accomplish this added by default to Spark Applications deployed by the Spark Operator.

Why is this needed?

In organizations with hardened Kubernetes environments, Kubernetes pods may be required to be run with a securityContext with readOnlyRootFileSystem: true. The JVM and Spark currently expect to be able to write to multiple places for temp directories that unpack libraries, for working directories, and for Spark artifacts. The Spark Operator could easily allow Spark Applications deployed by it to run as readOnlyRootFilesystem: true by adding additional volumeMounts and JVM options to Spark Applications by default.

Describe the solution you would like

The Spark Operator should mutate pods to add these workarounds when a Spark Application has a securityContext include readOnlyRootFilesystem: true:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
spec:
  volumes:
    - name: jvm-tmp
      emptyDir:
        sizeLimit: 750Mi
    - name: spark-ivy
      emptyDir:
        sizeLimit: 750Mi
  sparkConf:
    spark.driver.extraJavaOptions: -Djava.io.tmpdir=/opt/spark/jvm-tmp
    spark.executor.extraJavaOptions: -Djava.io.tmpdir=/opt/spark/jvm-tmp
  driver:
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
      - name: spark-ivy
        mountPath: /home/spark/.ivy2
        subPath: .ivy2
      - name: jvm-tmp
        mountPath: /opt/spark/jvm-tmp
        subPath: jvm-tmp
  executor:
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
      - name: spark-ivy
        mountPath: /home/spark/.ivy2
        subPath: .ivy2
      - name: jvm-tmp
        mountPath: /opt/spark/jvm-tmp
        subPath: jvm-tmp

Describe alternatives you have considered

Alternatively, we could ask Apache Spark to address this issue.

Additional context

I would like to submit a PR to fix this but I am looking for feedback and direction. Could someone point me to the best place in the Spark Operator to handle default mutations or additions to deployed Spark Applications?

Love this feature?

Give it a 👍 We prioritize the features with most 👍

The text was updated successfully, but these errors were encountered:

npgretz added the kind/feature label Nov 7, 2024

npgretz changed the title ~~Run Spark Applications with readOnlyRootFileSystem~~ [FEATURE] Run Spark Applications with readOnlyRootFileSystem Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

npgretz commented Nov 7, 2024

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

Comments

npgretz commented Nov 7, 2024

What feature you would like to be added?

Why is this needed?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Love this feature?