Self-host an aarch64 Linux GitHub Action runner

Gábor Csárdi

2024-09-20

ARM64, Docker, GitHub Actions, Linux, Oracle Cloud, Self-hosted runner, aarch64

I needed a cheap aarch64 Linux GitHub Actions runner for building R and R packages. I ended up with a VM in the Oracle Cloud, that makes use or their “always free resources” offer. It runs each container job in isolation, in a new Docker container. This post documents how I set it up.

Create the VM

I started with a free OCI subscription and ran into “out of capacity” errors when creating a VM with always free resources. The solution was to upgrade to a “pay as you go” subscription.

It important to read the conditions for always free resources carefully and create the VM accordingly.

In particular, I

chose my home region,
used the VM.Standard.A1.Flex shape, and
selected the Ubuntu 22.04 Minimal aarch64 OS.

The current limits of the always free resources are 4 OCPU, 24 GB memory and 200 GB disk. One can use all of them for a single machine, or create up to four machines. As long as the totals add up to these limits, the resources are free. No need to worry too much about the floating cost estimate box that says the VM will incur costs, it is simply wrong.

Configure the VM

When the VM is up and running, I set up a couple of things to make everyday maintenance easier. None of these are strictly neccessary, except for installing Docker. Your mileage may vary.

It seems like a good idea to restrict access to the VM. By default SSH is available from any IP address on OCI. To restrict this, I went to ‘Networking’ -> ‘Virtual cloud networks’, then chose the network of the new VM, then ‘Security List Details’, and edited the rule to port 22. to allow a single IP.

I like to add remote hosts to the .ssh/config file. E.g. for my new aarch64 VM I have:

1
2
3

host arm
     hostname <public-ip-of-vm>
     user ubuntu

Btw. on macOS, I need to edit the /etc/ssh/ssh_config file to use ssh after every macOS update.

I like to create a swap file on the VM. OCI is quite generous with the memory, but Linux starts killing processes when if runs out of memory, so a swap file can literally save a congested VM. There are plenty of guides on creating swap files, I like this one. For my VM with 12 GB memory I created a swap file of 12 GB. For a smaller machine, e.g. one with 4 GB memory, I would create a swap file of at least 8 GB, if the available disk space allows this.

Then I installed Docker, i.e. the docker.io Ubuntu package. I like to add the user to the docker group, so I don’t need to use sudo do run Docker commands:

1 2	sudo apt-get install docker.io addgroup ubuntu docker

The default user is called ubuntu on OCI Ubuntu machines, other VMs might have a different name. I had to log out and log in again for this change to take effect.

Configure the runner container hooks

By default a self-hosted runner runs each job as the user that runs the runner’s software. This is not great, because jobs are not isolated from each other and the rest of the VM. Ideally, each container job should run in a separate container.

GitHub lets us customize self hosted runners by adding four hooks that corresponds to the events that happen when running a job. The main focus of this interface is extensibility and not ease of use, so writing the hooks is a substantial amount of work. Luckily, I was able to use an example implementation without any modifications.

I cloned the runner-container-hooks repo first into my user’s home directory:

1
2
3

sudo apt-get install git
cd
git clone https://github.com/actions/runner-container-hooks

The example hooks are written in TypeSCript, so I needed to install Node.js 20, and compile the hooks from TypeScript to JavaScript. First installed Node 20.x:

curl -LO https://nodejs.org/dist/latest-v20.x/node-v20.17.0-linux-arm64.tar.gz
tar xzf node-*.tar.gz
sudo mv node-v20*-linux-arm64 /opt/
echo 'export PATH=/opt/node-v20.17.0-linux-arm64/bin/:$PATH' >> ~/.bashrc
source .bashrc

Check if it works:

1	node --version

v20.17.0

Then I compiled the hooks:

cd runner-container-hooks
npm install
npm run bootstrap
npm run build-all

Add the self hosted runner

I followed the GitHub documentation to add a self hosted runner, but did not start it yet. (From the ‘Settings’ tab of my organization or user I selected ‘Actions’, then ‘Runners’, then ‘New runner’.) Selected an ARM64 Linux runner. I named it arm, but the name is not important, and used the default labels: self-hosted, Linux and ARM64. IF the runner is for specific jobs, then it makes sense to create more spcific labels.

Before running run.sh, I needed to set up the runner to use the custom hooks from runner-container-hooks. I added a .env file to the directory that contains the runner software, i.e. next to the run.sh file. The .env file contains this:

1
2
3

LANG=C.UTF-8
ACTIONS_RUNNER_CONTAINER_HOOKS=/home/ubuntu/runner-container-hooks/packages/docker/dist/index.js
ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER=true

The path points to my runner-container-hooks directory, so it might be different if the VM has a different username. This is all that is needed for the hooks, and the runner is ready to run now!

I find it convenient to run the runner within a screen or tmux session, so I can easily detach its virtual terminal and log out of the VM cleanly:

1
2
3

sudo apt-get install screen
screen bash
./run.sh

Pressing CTRL+a d (i.e. CTRL and a together, then let go and d) detaches the terminal. Running screen -list lists the screen sessions. The output looks like this:

ubuntu@arm:~/actions-runner$ screen -list
There is a screen on:
	12867.pts-0.arm	(09/19/24 07:20:16)	(Detached)
1 Socket in /run/screen/S-ubuntu.

screen -r re-attaches the session:

1	screen -r 12867.pts-0.arm

and then CTRL+a d detaches it again.

Instead of starting the runner from an interactive session, I could also configure it as a service. I am sure that configuring multiple runners as a service works, though.

The self hosted runner is running now. In the list of runners at https://github.com/organizations/<my-org>/settings/actions/runners it shows up as “idle”.

Run jobs

To run a job on the new runner, I can use one or more of its configured labels. Here is an example:

on:
  workflow_dispatch:

jobs:
  arm64:
    runs-on: [self-hosted, linux, ARM64]
    name: arm64 job
    container:
      image: "ubuntu:22.04"
      volumes:
        - ${{ github.workspace }}:/${{ github.workspace }}
    steps:
      - uses: actions/checkout@v4
      - name: Test
        run: |
          uname -a
          cd ${{ github.workspace }}
          pwd
          ls -l

This workflow was a workflow_dispatch trigger, so I can start it from the GitHub web UI.

Tips and limitations

Using container: means that the job will run in a container.

If the jobs needs to run node.js actions, the container needs to support Node.js, version 20.x nowadays. E.g. actions/checkout@v4, actions/upload-artifact@v4, etc. need Node.js 20.x.

The runner can run Docker in Docker by default! I.e. in the above example I could install the docker.io Ubuntu package inside the container, and then call docker run, etc. Here is a real example. Note, however that all these containers are using the same docker daemon, so they’re going to be running as sibling containers.

Only run trusted code in these containers! Every container has access to the single Docker daemon of the VM, so essentially they have admin access to the VM. Do not run untrusted third party code! Also, do not kill all Docker containers from within the job, that’ll kill the job as well.

I like to mount ${{ github.workspace }} into the container, at exactly the same place as on the runner (the VM), and then run commands inside that directory. This makes it easier to copy files between containers when running Docker in Docker.

It seems that the runner software never cleans up its temporary directory in _work/_temp, so it’ll eventually fill up and will cause jobs to fail. I don’t really have a good solution for this currently, apart from a manual cleanup after the failure. I suppose a cron job could work, but it’d have to be careful to only clean up if no jobs are running.

I can start a second (third, fourth, etc.) runner on the same machine! Use a different directory than the default actions-runner and a new screen session. This especially makes sense if the VM has more than one OCPUs. Keep in mind that both runners will use the same Docker daemon, so for Docker in Docker, the jobs have to be careful to avoid name clashes, i.e. the names of containers, networks, etc. must be different for each job (and each sub-job of a matrix job!), in case they are running concurrently on the VM.

Creating more runners is also a way (the only way?) to use the same VM in multiple GitHub organizations.

Improvements

Ideally, the VM would dynamically create runners as they are needed for jobs, up to a limit (probably 4 for the always free OCI VM) and then remove them once the jobs are done. To get four aarch64 Linux runners I don’t really want to create four VMs, or run four independent runners manually on the same VM.

This is certainly possible, but needs a bit more configuration, and some Kubernetes knowledge does not hurt, either. In particular, later I would like to try ARC. It should work pretty well on an OCI VM with all the always free resources, i.e. 4 OCPUs, 24 GB memory and 200 GB disk.

Other options

I also considered other options.

The simplest way to use aarch64 runners on GitHub Actions is to pay for their hosted runners.

Another way to run aarch64 Linux is to use multi-architecture Docker on the GitHub hosted runners. This is how [I run Linux on s390x on GHA]. It works well, but it is quite slow. Some Linux distros are slower than others, e.g. Fedora is very slow, probably because it is compiled for a more modern processor that is harder to emulate. I do use this method for some tasks, but it is not feasible for others: the six hours time limit is not enough to compile R on Fedora this way. (This is a task that takes less than 10 mintes on a native aaarch64 machine!)

All major cloud providers support aarch64 Linux servers: Azure, AWS, Google Cloud, and there are many smaller ones like netcup.

These options cost money, so they could not compete with the Oracle Cloud server.

Updates

2024-09-21: added paragraph about multi-architecture Docker as an alternative. Also moved the ‘Other options’ section to the end.

2024-09-23: added advice about restricting ssh to the VM.

2024-09-24: added note about configuring the runner as a service.