Similar to bug 968804, beaker-provision reuses distro tree images from the distrotrees/ directory if they were already fetched there by beaker-pxemenu. In that case it will never re-fetch the images. This is a problem for Fedora rawhide at least, where the images are not immutable but are actually changed in place.
Relatedly, beaker-pxemenu itself also never re-fetches any images once they have already been fetched so it suffers the same problem. It will have to perform a conditional HTTP request for every image every time it runs (unfortuantely). Ideally it should also pipeline its requests, now that it will be issuing so many of them so frequently. Using requests instead of urllib2 should make this easy.
The other option would be to require the use of snapshots for import purposes. So anyone wanting to import Fedora Rawhide reliably would need to use something that can generate regular snapshots rather than using the Rawhide build tree directly.
Tim, Dennis, Alexander, Kamil - interested in your feedback on this one.
Should we continue trying to make Beaker tolerate the "rebuild in place" behaviour of Fedora Rawhide? Or is it acceptable to require the creation of actual snapshot trees that don't change after the initial import (aside from being deleted after a few days) for use by Beaker.
Mutating a distro tree in place really messes with some of Beaker's fundamental assumptions (for example, you lose almost all visibility into what was actually tested by a run), so I actually have a strong preference for the "immutable snapshots required" approach.
Interestingly enough, Alex and I were talking about this a bit last week.
I see advantages and disadvantages to both approaches. While mutation of distro trees in place would be more convenient from an admin perspective, it makes reproduction of results more difficult because you're always testing against a moving target. Then again, qa isn't really set up to take and host rawhide or devel tree snapshots for use in beaker and I'm not really sure how we would accomplish that.
So, as a tester, I'd prefer immutable distro trees so that it's clear what was tested and reproduction isn't quite so complicated/convoluted. As a person who would likely be tasked with maintaining the tree snapshots (unless releng is willing to do it), I really don't want to add another system to create, maintain and find resources for.
I haven't worked with Beaker too much, but I believe we should strive for as little error-prone processes as possible. If we mutate the tree in-place using rsync, we can't avoid the race conditions. Even with rsync --delay-updates, there is still a time interval when all the new files are moved to the correct location, so it's not atomic. Beaker executes a massive number of jobs, and with this approach some of them will unavoidably hit this race condition from time to time - for example they'll receive old yum metadata, but the rpm packages will be updated shortly afterwards (i.e. they won't match the metadata).
If we wanted to modify Rawhide tree in-place using rsync, we would need to have another layer above that to make sure the updates are atomic - either using file system features (if that's possible), or temporarily delay all Beaker requests which occur in the critical interval. I'm not sure how complicated this would be.
Having immutable trees seems to be a simpler option. There still needs to be an extra script - fedmsg listener which would listen for new composes being ready, then running rsync, and then making a snapshot of it and registering it into Beaker. But we don't need to deal with it atomically. Also this brings the advantage of knowing exactly what was tested, and the ability to run your test on an older compose (if still available). From my QA viewpoint, this is a large advantage. The disadvantage is of course increased storage requirements and sync duration (creating the snapshot takes a _lot_ of time, unless some clever technique is employed, like LVM snapshots). This needs to be decided by relevant people, I have no idea whether it is feasible hardware-wise to support a number of Rawhide snapshots.
But from my QA perspective, it would be better to go the more reliable way (immutable snapshots), provided it doesn't completely kill performance (if the sync took 10 times longer just to avoid some race conditions, it might be debatable whether it's better or not).
I really do not want to work out ways to do rawhide snapshots. however you could possibly pull rawhide from http://kojipkgs.fedoraproject.org/mash/ though you need to know the location to pull from.
That mash layout actually looks workable for HTTP snapshot imports. Is anything generated on fedmsg when a new tree is available there?
From QA point of view we need immutable snapshots. They not only make things easier to test and reproduce but also provide some reporting/tracking capability. All existing QE processes (installation testing wise) are designed around immutable snapshots.
On the topic about race conditions, rsync, disk space, etc - I think making LVM snapshots before trees are imported in Beaker will do the trick. This all has to be automated anyway.
(In reply to Nick Coghlan from comment #7)
> That mash layout actually looks workable for HTTP snapshot imports. Is
> anything generated on fedmsg when a new tree is available there?
Yes, the kojipkgs location is where we do the compose, its then rsynced over the top of the previous days rawhide. you can use the same fedmsg notifications and just pull from a different location.
Thanks Dennis, in that case, I'll close this as WONTFIX - Beaker expects imported trees to be immutable.
Kamil, Alexander - could you look into doing the distro imports for Fedora Rawhide (for both beaker.fedoraproject.org and the internal Beaker instance) based on the kojipkgs location above instead of the main updated-in-place rawhide tree?