You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GHA jobs fail instantly if a pod is unscheduable due to waiting for another node to become available (if the resource request for CPU/Memory is high, waiting for the node autoscaler)
There should be a timeout field either in the runner set or container hooks podtemplate that allows the workflow pod to wait for x minutes till the pod is scheduled after another node is alive.
To Reproduce
Install ARC Controller + Runner set 0.9.2
define ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE with the podTemplate, and containerMode: "Kubernetes"
define a pod template like this
apiVersion: v1
data:
default.yml: |
"apiVersion": "v1"
"kind": "PodTemplate"
"metadata":
"name": "runner-pod-template"
"spec":
"containers":
- "name": "$job"
"resources":
"limits":
"cpu": "3000m"
"requests":
"cpu": "3000m"
Additional Context
template:
spec:
initContainers:
- name: kube-initimage: ghcr.io/actions/actions-runner:latestcommand: ["/bin/sh", "-c"]args:
- | sudo chown -R 1001:123 /home/runner/_workvolumeMounts:
- name: workmountPath: /home/runner/_worksecurityContext:
fsGroup: 123## needed to resolve permission issues with mounted volume. https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors#error-access-to-the-path-homerunner_work_tool-is-deniedcontainers:
- name: runnerimage: ghcr.io/actions/actions-runner:latestcommand: ["/home/runner/run.sh"]env:
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATEvalue: /home/runner/pod-templates/default.yml
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINERvalue: "false"## To allow jobs without a job container to run, set ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER to false on your runner container. This instructs the runner to disable this check.volumeMounts:
- name: pod-templatesmountPath: /home/runner/pod-templatesreadOnly: truevolumes:
- name: workephemeral:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]storageClassName: "managed-csi"resources:
requests:
storage: ${local.volume_claim_size}
- name: pod-templatesconfigMap:
name: "runner-pod-template"containerMode:
type: "kubernetes"## type can be set to dind or kubernetes## the following is required when containerMode.type=kuberneteskubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]# For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpathstorageClassName: "managed-csi"resources:
requests:
storage: 50GiPod Template YAML:
apiVersion: v1data:
default.yml: | "apiVersion": "v1" "kind": "PodTemplate" "metadata": "name": "runner-pod-template" "spec": "containers": - "name": "$job" "resources": "limits": "cpu": "3000m" "requests": "cpu": "3000m"
The text was updated successfully, but these errors were encountered:
GHA jobs fail instantly if a pod is unscheduable due to waiting for another node to become available (if the resource request for CPU/Memory is high, waiting for the node autoscaler)
Warning OutOfcpu pod/arc-runner-set-productdev-b8rwm-runner-b5m9z-workflow Node didn't have enough resource: cpu, requested: 3000, used: 6560, capacity: 7820
What should be happening preferablly:
There should be a timeout field either in the runner set or container hooks podtemplate that allows the workflow pod to wait for x minutes till the pod is scheduled after another node is alive.
To Reproduce
Additional Context
The text was updated successfully, but these errors were encountered: