Skip to content

Conversation

@velioglu
Copy link
Member

@velioglu velioglu commented Oct 8, 2025

To run runners on the AWS we need to have an AMI. Introducing a new pipeline and update existing packer templates to create that AMI on AWS. Using the same packer template for AWS and Azure based images to have conflicts while updating our fork with github's upstream so we won't need to control new scripts each time

enescakir and others added 26 commits August 20, 2025 08:43
The new version of az cli started to throw an exception for pending
downloads.

Azure/azure-cli#10160 (comment)
When we booted a vm from generated image, we found that apt-get update
failed with the following error previously:

    E: Encountered a section with no Package: header
    E: Problem with MergeList /var/lib/apt/lists/_etc_apt_apt-mirrors.txt_dists_jammy_multiverse_i18n_Translation-en
    E: The package lists or status file could not be parsed or opened.

In the previous cases, generating a new image was solving the issue. But
for the current version, I created a new image at least 10 times but the
issue was still there. So I decided to remove the apt lists and rebuild
them from scratch. It's happening for only Ubuntu 22.04 x64 image.
Since Docker containers can't resolve DNS addresses by default in our
network setup, we added a custom systemd-resolved configuration to
redirect DNS requests from Docker containers to the Quad9 DNS servers.

Recently, we started using dnsmasq as a DNS resolver on our virtual
machines: ubicloud/ubicloud#1960. We no longer
need this configuration, as dnsmasq will handle DNS requests from Docker
containers.
actions/runner repo has been forked to allow clients of that repo
to update cache API URI. Since that fork will be used with upcoming
images, updating actions/runner with ubicloud/runner
To support transparent cache proxy, downloading the binaries for
cache proxy server and running it as a service.
Server for ghcup is flaky, it sometimes return error while trying
to install it. Sample error is given below

`
build_image: curl: (35) error:0A000126:SSL routines::unexpected eof while reading
build_image: /tmp/script_9139.sh: line 24: ghcup: command not found
`

So, try installing via loop to make sure it exists.
Server for temurin java is flaky, it sometimes return error while trying
to install it. Sample error is given below

`
Failed to fetch https://packages.adoptium.net/artifactory/deb/pool/main/t/temurin-8/temurin-8-jdk_8.0.422.0.0+5_arm64.deb
Connection failed [IP: 104.18.20.66 443]
`

So, try installing via loop to make sure it is installed.
GitHub environment variables are only available in the workflow, not for
unix user. We need some of these variables to run some features such as
Ubicloud Cache. Runner script allows to run a hook script before the job
starts and this hook has access to the environment variables. So we just
persist them to a file and read them in the Ubicloud Cache proxy.
Moving these steps to image generation to speed up provisioning

ubicloud/ubicloud#2371
Moving it from runtime to image generation to improve provisioning times

ubicloud/ubicloud#3078
`accessSas` to `accessSAS`
We were using qemu-img convert tool to convert VHD image
to RAW image. Though the tool doesn't work for the fixed
VHD image we have. Since fixed VHD file only contains the
data itself and footer, copying data part as the RAW
image file. Content length of data part is found by
vhdiinfo tool.
Chrony service is failing because we don't use MS Hyper-V
virtualization.

    × chrony.service - chrony, an NTP client/server
         Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
         Active: failed (Result: exit-code) since Fri 2025-04-25 11:25:20 UTC; 3s ago
           Docs: man:chronyd(8)
                 man:chronyc(1)
                 man:chrony.conf(5)
        Process: 2126 ExecStart=/usr/lib/systemd/scripts/chronyd-starter.sh $DAEMON_OPTS (code=exited, status=1/FAILURE)
            CPU: 20ms

    Apr 25 11:25:20 vm7j9k6w systemd[1]: Starting chrony, an NTP client/server...
    Apr 25 11:25:20 vm7j9k6w chronyd[2135]: chronyd version 4.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NT>
    Apr 25 11:25:20 vm7j9k6w chronyd[2135]: Could not open /dev/ptp_hyperv : No such file or directory
    Apr 25 11:25:20 vm7j9k6w chronyd[2135]: Fatal error : Could not open PHC
    Apr 25 11:25:20 vm7j9k6w chronyd-starter.sh[2133]: Could not open PHC
    Apr 25 11:25:20 vm7j9k6w systemd[1]: chrony.service: Control process exited, code=exited, status=1/FAILURE
    Apr 25 11:25:20 vm7j9k6w systemd[1]: chrony.service: Failed with result 'exit-code'.
    Apr 25 11:25:20 vm7j9k6w systemd[1]: Failed to start chrony, an NTP client/server.

We just need to comment out related line in chrony config file.

    refclock PHC /dev/ptp_hyperv poll 3 dpoll -2 offset 0
GitHub Actions has workflow commands that let you send messages to the
GitHub UI. This can be helpful for giving feedback or information during
a workflow's execution to our customers. The control plane saves the
message to a file, and the start and complete hooks print them if the
files exist.

Since the file needs to be in workflow command format, GitHub Actions
automatically parses the output and shows the message in the UI.

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-a-notice-message
GitHub updated templates to keep the logic for template source,
variables and image specific variables in a single source. Updating
templates used only by us and build scripts to reflect those changes.
Previously, we passed the JIT config to the runner execution script via
command-line arguments using xargs. This included the JIT token in the
transient systemd unit file generated by systemd-run, which sometimes
failed with "Failed to resolve unit specifiers" errors. These issues are
hard to reproduce locally, but may be caused by template expansion
limits or token size.

We take direct control over the unit file instead of relying on
systemd-run’s transient unit generation, to better understand and debug
any related issues.

We now pass the JIT config via a file. This is more reliable for large
strings and avoids leaking sensitive tokens into the unit description.

This change is currently behind a feature flag. Once verified in
production, we’ll move the file creation to the image generation step.

Production test:
ubicloud/ubicloud@e2627b5
To run runners on the AWS we need to have an AMI. Introducing
a new pipeline and update existing packer templates to create
that AMI on AWS. Using the same packer template for AWS and
Azure based images to have conflicts while updating our
fork with github's upstream so we won't need to control
new scripts each time
@velioglu velioglu force-pushed the ubicloud branch 2 times, most recently from 5ad9ecd to d7ee6e6 Compare November 12, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants