Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

Open
npgretz opened this issue Nov 7, 2024 · 0 comments
Open

[FEATURE] Run Spark Applications with readOnlyRootFileSystem #2313

npgretz opened this issue Nov 7, 2024 · 0 comments

Comments

@npgretz
Copy link
Contributor

npgretz commented Nov 7, 2024

What feature you would like to be added?

I would like deployed Spark Apps to be able to be run in readOnlyRootFilesystem. Currently, they fail when no longer able to write to multiple places inside the container. These restrictions are due to how Spark works and the Apache Spark image.

While we could request that Apache Spark create a Kubernetes specific Docker image that allows readOnlyRootFilesystem, I think a quicker approach is to have several workarounds that have allowed me to accomplish this added by default to Spark Applications deployed by the Spark Operator.

Why is this needed?

In organizations with hardened Kubernetes environments, Kubernetes pods may be required to be run with a securityContext with readOnlyRootFileSystem: true. The JVM and Spark currently expect to be able to write to multiple places for temp directories that unpack libraries, for working directories, and for Spark artifacts. The Spark Operator could easily allow Spark Applications deployed by it to run as readOnlyRootFilesystem: true by adding additional volumeMounts and JVM options to Spark Applications by default.

Describe the solution you would like

The Spark Operator should mutate pods to add these workarounds when a Spark Application has a securityContext include readOnlyRootFilesystem: true:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
spec:
  volumes:
    - name: jvm-tmp
      emptyDir:
        sizeLimit: 750Mi
    - name: spark-ivy
      emptyDir:
        sizeLimit: 750Mi
  sparkConf:
    spark.driver.extraJavaOptions: -Djava.io.tmpdir=/opt/spark/jvm-tmp
    spark.executor.extraJavaOptions: -Djava.io.tmpdir=/opt/spark/jvm-tmp
  driver:
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
      - name: spark-ivy
        mountPath: /home/spark/.ivy2
        subPath: .ivy2
      - name: jvm-tmp
        mountPath: /opt/spark/jvm-tmp
        subPath: jvm-tmp
  executor:
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
      - name: spark-ivy
        mountPath: /home/spark/.ivy2
        subPath: .ivy2
      - name: jvm-tmp
        mountPath: /opt/spark/jvm-tmp
        subPath: jvm-tmp

Describe alternatives you have considered

Alternatively, we could ask Apache Spark to address this issue.

Additional context

I would like to submit a PR to fix this but I am looking for feedback and direction. Could someone point me to the best place in the Spark Operator to handle default mutations or additions to deployed Spark Applications?

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@npgretz npgretz changed the title Run Spark Applications with readOnlyRootFileSystem [FEATURE] Run Spark Applications with readOnlyRootFileSystem Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant