Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at "Job is waiting for a runner from 'runner-name' to come online" in DinD-mode #3485

Closed
4 tasks done
paranerd opened this issue Apr 30, 2024 · 4 comments
Closed
4 tasks done
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode question Further information is requested

Comments

@paranerd
Copy link

Checks

Controller Version

0.9.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Installed ARC as per [these instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
1. Deployed a runner as per [those instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
    - Basically just downloaded the official [values](https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set/values.yaml) to `my-values.yaml`
    - Uncommented lines 78+79 (`containerMode`)
    - Uncommented lines 114-158 (`template.spec`)
    - Set `--values "my-values.yaml`
- Installed via Helm
- Runner shows up in GitHub
- Running a job gets stuck in the above mentioned state

Describe the bug

When trying to host a DinD container, the runner shows up in GitHub but when trying to run jobs on it, it just gets stuck waiting.

Deploying a "regular" controller works as expected, though.

Describe the expected behavior

The DinD container should pick up available jobs and run them.

Additional Context

githubConfigUrl: ""

githubConfigSecret:
  github_token: ""

containerMode:
  type: "dind"

template:
  spec:
    initContainers:
    - name: init-dind-externals
      image: ghcr.io/actions/actions-runner:latest
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
        - name: dind-externals
          mountPath: /home/runner/tmpDir
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
    - name: dind
      image: docker:dind
      args:
        - dockerd
        - --host=unix:///var/run/docker.sock
        - --group=$(DOCKER_GROUP_GID)
      env:
        - name: DOCKER_GROUP_GID
          value: "123"
      securityContext:
        privileged: true
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        - name: dind-externals
          mountPath: /home/runner/externals
    volumes:
    - name: work
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}

Controller Logs

https://gist.github.com/paranerd/d41dd1de26c3c18c67ae179f41afb67b

Runner Pod Logs

I don't have those as the runner never even starts in the first place.
@paranerd paranerd added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Apr 30, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic
Copy link
Member

Hey @paranerd,

If you inspect the log, it says that:

2024-04-30T13:24:54Z ERROR EphemeralRunner Failed to create pod resource for ephemeral runner. {"ephemeralrunner": {"name":"arc-runner-set-docker-1-998lp-runner-9v6sv","namespace":"arc-runners-docker-1"}, "error": "Pod "arc-runner-set-docker-1-998lp-runner-9v6sv" is invalid: [spec.volumes[3].name: Duplicate value: "dind-sock", spec.volumes[4].name: Duplicate value: "dind-externals", spec.initContainers[1].name: Duplicate value: "init-dind-externals"]"}

Since you already expanded the spec, you should leave container mode commented out.

@nikola-jokic nikola-jokic added question Further information is requested and removed bug Something isn't working needs triage Requires review from the maintainers labels May 23, 2024
@paranerd
Copy link
Author

Thanks for looking into this!

As it turns out, I'm having the same issue as described here.

I fixed it by removing the containerMode lines (as you suggested) and using the following specs:

template:
spec:
initContainers:
- name: init-dind-externals
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['cp', '-r', '-v', '/home/runner/externals/.', '/home/runner/tmpDir/']
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['/home/runner/run.sh']
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: '123'
- name: DOCKER_IPTABLES_LEGACY
value: '1'
resources:
requests:
memory: "500Mi"
cpu: "300m"
limits:
memory: "500Mi"
cpu: "300m"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}

with an emphasis on

- name: DOCKER_IPTABLES_LEGACY
  value: '1'

which seems to be the main fix.

@nikola-jokic
Copy link
Member

Thank you for letting us know! Legacy IP tables seems to be a problem on some platforms, but I'm just not sure if it should be the default spec that we expand to 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants