(cache)Proposed new design for builds.sr.ht

~sircmpwn/sr.ht-dev

Development discussion for sr.ht. When contributing patches to sr.ht, please edit the [PATCH] line to include the specific sr.ht project you're contributing to, e.g.

[PATCH lists.sr.ht v2] Add thing to stuff

1 2

Proposed new design for builds.sr.ht

Drew DeVault

Details

Message ID: <20181230161951.GA4488@homura.localdomain>

Download raw message

an hour ago

Greetings! As I've been working on multi-arch support, handling
roadblocks in the current design, and considering the feedback of the
community, I'm thinking about what the next generation of builds.sr.ht
looks like. Here are the goals:

- Better management & distribution of build images
- Each worker advertising what image+arch combos it supports, rather
  than assuming every worker supports everything
- Ability for third-parties to run build boxes which are slaved to
  builds.sr.ht upstream
- Diversification of build drivers
- Support external ownership over secrets
- Improved communication between build environment to/from host & master

The first change will be to switch from Celery to RabbitMQ for build
distribution. I intend to manage build distribution over one or more
exchanges, with the primary exchange used for the shared build boxes
that run on my infrastructure, and allowing users to set up secondary
exchanges that they can run their build boxes against.

To support each worker advertising a different set of
images/arches/drivers/etc, I intend to use AMQP routing keys. A worker's
build capabilities will be expressed as "image/version:arch+driver",
which is expressed in your build manifest like this:

base: alpine/edge:x86_64+kvm

Naturally this could be shortened to "base: alpine/edge", the rest being
assumed as defaults. But you could also do:

base: alpine/edge:aarch64+qemu

To use qemu software emulation of aarch64. Or alpine/edge:riscv64+native
once I set up my RISC-V system.

For better management & distribution of build images, I intend to use
bittorrent, managed entirely in code (rather than by your typical
end-user bittorrent daemon). Then, rather than the current system of the
image refreshes going out to each build slave to push the new images, we
can just push them to a central repository and let the boxes sync
themselves up. Can also use this system to distribute pre-built images
to third parties.

The diversification of build drivers will be necessary to support more
than just KVM. Today's QEMU-based driver can support KVM and software
emulation of targets, but I want to codify that distinction in the build
manifest and build exchange for when there are build slaves in the
future running with a greater variety of KVM-supported architectures.
The full list of build drivers I have planned are:

- kvm, qemu: one qemu-based driver supporting two flavors of builds
- docker: does what it says on the tin. Note that I am unlikely to offer
  docker support on the shared builders for security reasons, though I
  might consider it if anyone figures out this puzzler[0]
- chroot: runs builds in a chroot with overlayfs+tempfs, so you can
  basically do whatever if you control the build hardware
- riscv64: the RISC-V builder has some special needs which will require
  a custom driver, so this'll have to exist. I can go into detail if
  anyone is curious but it's not important to the overall builds.sr.ht
  design.
- designed in a way that users can write their own build drivers and
  plug them into the exchange, for example with windows+powershell or
  something

[0] https://github.com/moby/moby/issues/37575

If you run your own build box then you might also want to manage your
own secrets, so that you needn't give them over to sr.ht.  This one is
pretty simple, it'd just take the form of:

    secret-provider: https://secrets.example.org

To specify the place to fetch secrets from. Some combination of APIs
will allow you to confirm that the build slave asking for the secrets is
running a build that uses them, and each build box will have an
asymmetric key for signing these requests.

Lastly, I want to improve the way that the build environment
communicates with the hosts. This will probably take the form of an HTTP
API which communicates to the build host on a TCP port set up when the
driver is initialized. Today I just intend to use this for ending builds
early, to replace today's fragile exit-code-based hack, but there are
probably more use-cases for this in the future.

A final note, I also want to make it possible to obtain an interactive
shell in the build environment. Basically this will just take the form
of something like this in your build manifest:

    shell: true

Then after all of your steps run, instead of tearing down the build
environment it'll print an SSH connection string into the build log and
wait for you to log in. This feature could be a target of abuse, so
it'll require some finesse to get right.

So that's what's in store. Any feedback or better ideas?

minus

Details

Message ID: <e601d9f6-996c-5a30-f9aa-10971bd7752f@mnus.de>
In-Reply-To: <20181230161951.GA4488@homura.localdomain> (view parent)

Download raw message

4 minutes ago

Hi there,

I'm not using builds.sr.ht, but I can offer some feedback, partly based
on experience with similar projects.


> The first change will be to switch from Celery to RabbitMQ for build
> distribution.

Workers polling an API may be a simpler alternative offering more
control over distribution (since you don't push jobs into a queue but
have them pulled by the workers). That also avoids having to maintain
and know RabbitMQ (or Celery for that matter). At least that's my
experience from a cancelled AMQP project.


> To support each worker advertising a different set of
> images/arches/drivers/etc

That sounds like a good idea. Maybe it can be extended by filtering by
hardware capabilities, like lots of cores, much RAM or a 10G network
connection?


> For better management & distribution of build images, I intend to use
> bittorrent, managed entirely in code (rather than by your typical
> end-user bittorrent daemon). Then, rather than the current system of the
> image refreshes going out to each build slave to push the new images, we
> can just push them to a central repository and let the boxes sync
> themselves up. Can also use this system to distribute pre-built images
> to third parties.

While that sounds incredibly cool, it also seems quite overkill. I'd
image a central repo on a decent server (i.e. 1gbit/s network) to do the
job just fine for dozens of build servers.

When there's a lot of custom images (e.g. software baked in for specific
build types a specific user has) a central server probably falls flat
rather quick. Only using the central repo to track image metadata could
work neatly for custom images. No central storage is occupied and
distribution purely works between build servers. Maybe you already had
it in mind like this.

A somewhat unrelated thought: Do some images have to be private and
should thus be encrypted if transferred over BitTorrent?


> - docker: does what it says on the tin. Note that I am unlikely to offer
>   docker support on the shared builders for security reasons, though I
>   might consider it if anyone figures out this puzzler[0]

For security reasons it makes sense to run Docker in a VM, but why does
the VM have to run in Docker?


> A final note, I also want to make it possible to obtain an interactive
> shell in the build environment. Basically this will just take the form
> of something like this in your build manifest:
> 
>     shell: true
> 
> Then after all of your steps run, instead of tearing down the build
> environment it'll print an SSH connection string into the build log and
> wait for you to log in. This feature could be a target of abuse, so
> it'll require some finesse to get right.

Maybe that should be an option when triggering the build instead, so you
can debug any build configuration at any time?