With the new dracut 059 (well, the diff also includes anaconda 39.2, but dracut seems the most likely suspect), installer images no longer boot. They loop on an error "mount: /sysroot: special device LiveOS_rootfs does not exist." This is obviously an automatic blocker as it prevents the installer image booting at all.
Note, previous build was 057, for some reason we're a couple of months behind getting 058 or 059.
Checking the journal messages shows: overlayfs: failed to resolve '/run/overlayfs': -2
OK, so I think see what's going on here, more or less, but fixing it seems a bit tricky, at least right now (maybe I'll figure it out shortly). I believe the trouble starts with this commit: https://github.com/dracutdevs/dracut/commit/8caaad4fc2d75982eb87f5ebc72a4c276986f756 it moves the setup of the overlayfs on the `dmsquash-live-root.sh` path out from being inline in that file to a separate script. Before that change, we had this block: if [ -n "$overlayfs" ]; then ... mkdir -m 0755 -p /run/overlayfs mkdir -m 0755 -p /run/ovlwork if [ -n "$reset_overlay" ] && [ -h /run/overlayfs ]; then ovlfs=$(readlink /run/overlayfs) info "Resetting the OverlayFS overlay directory." rm -r -- "${ovlfs:?}"/* "${ovlfs:?}"/.* > /dev/null 2>&1 fi if [ -n "$readonly_overlay" ] && [ -h /run/overlayfs-r ]; then ovlfs=lowerdir=/run/overlayfs-r:/run/rootfsbase else ovlfs=lowerdir=/run/rootfsbase fi ... There were/are various ways `$overlayfs` can get set in that script. It can get set to "yes" if a certain arg is on the cmdline: getargbool 0 rd.live.overlay.overlayfs && overlayfs="yes" but it can get set to "required" in three other cases. Two are in this block (edited and condensed): === if [ -z "$setup" -a -n "$devspec" -a -n "$pathspec" -a -n "$overlay" ]; then ... if [ -f /run/initramfs/overlayfs$pathspec -a -w /run/initramfs/overlayfs$pathspec ]; then ... if [ -z "$oltype" ] || [ "$oltype" = DM_snapshot_cow ]; then ... else ... if [ -d /run/initramfs/overlayfs/overlayfs ] && [ -d /run/initramfs/overlayfs/ovlwork ]; then ... overlayfs="required" ... elif [ -d /run/initramfs/overlayfs$pathspec ] && [ -d /run/initramfs/overlayfs$pathspec/../ovlwork ]; then ... overlayfs="required" === One is in this block (again edited and condensed): === # we might have an embedded fs image on squashfs (compressed live) if [ -e /run/initramfs/live/${live_dir}/${squash_image} ]; then SQUASHED="/run/initramfs/live/${live_dir}/${squash_image}" fi if [ -e "$SQUASHED" ]; then ... if [ -d /run/initramfs/squashfs/LiveOS ]; then ... elif [ -d /run/initramfs/squashfs/proc ]; then ... overlayfs="required" === so if the cmdline arg is set, *or* if we go down any of those three paths, we wound up with $overlayfs as a non-zero length string and did the stuff to set up /run/overlayfs. *After* that change, we now have the file `mount-overlayfs.sh` set up as a hook (I think), which does more or less the same stuff - but *only* if the cmdline arg is set. That's the only conditional that was 'ported' to that file: === getargbool 0 rd.live.overlay.overlayfs && overlayfs="yes" ... if [ -n "$overlayfs" ]; then (do the stuff) fi === i.e. if $overlayfs isn't set it doesn't do the stuff, and that `getargbool` is the *only* way $overlayfs can be set in that file. Things were complicated a bit further by a later commit: https://github.com/dracutdevs/dracut/commit/40dd5c90e0efcb9ebaa9abb42a38c7316e9706bd that basically tweaks the approach by making the script part of a new module called 90overlayfs. But it doesn't ultimately change the logic much: the script only *does* stuff if the cmdline arg is set. I booted an affected image with `rd.debug` so we get `sh -z` output from all the scripts, and from that we can see that our images are indeed hitting one of the paths where overlayfs gets set to "required", specifically, the third (squashfs-y) one, because we see: + reloadsysrootmountunit=':>/xor_overlayfs;' + overlayfs=required that "reloadsysrootmountunit" line is a dead giveaway we're in that third block. So the problem is the new 90overlayfs module doesn't actually set up the overlayfs because it isn't set to do it in all the cases where it should be - it wasn't properly set up to do it on all the paths where $overlayfs is set to "required" in dmsquash-live-root.sh. The obvious thing to do is just port all those cases over. The first two are, uh, rather complicated, but the third at least *seems* easy, so I was going to do that. But there's a problem even with that: dmsquash-live-root.sh unmounts the squashfs at the end! [ -e "$SQUASHED" ] && umount -l /run/initramfs/squashfs so we can't even port *that* check into the 90overlayfs module because `/run/initramfs/squashfs/proc` won't be there any more. We probably need to have dmsquash-live-root.sh just 'signal' to the module somehow when it actually needs to set up the overlayfs, because recreating all these checks at the point where the module runs seems impractical. But I'm not sure off the top of my head what's the canonical way to do that in dracut, or if there might be a better choice. I'll look into it in a bit.
For now pvalena has reverted the entire PR downstream, but not sure if that will be the long-term fix.
I don't have a reproducer at hand, but what about extending the kernel cmdline in those other scripts? I mean replace overlayfs="required" with echo "rd.live.overlay.overlayfs=1" > /etc/cmdline.d/dracut-need-overlay.conf Adam, can you try that?
Wow, uh, yikes. I mean, that could work (if there's no caching involved in how `getargbool` works, at least?) but it seems very hacky. Surely there's a better way? I was assuming there must be existing cases where different parts of dracut need to 'signal' to each other like this and there would be an existing canonical way to do it, I just don't happen to know what that is so I couldn't write a PR. I guess I'll file an upstream issue for this? That way we can get some input from the folks who wrote and reviewed the change...
We are doing something like this on several places: modules.d/35connman/cm-config.sh: echo rd.neednet >> /etc/cmdline.d/connman.conf modules.d/35network-legacy/parse-ip-opts.sh: echo "rd.neednet=1" > /etc/cmdline.d/dracut-neednet.conf modules.d/35network-legacy/parse-ip-opts.sh: >> /etc/cmdline.d/80-enx.conf modules.d/35network-manager/nm-config.sh: echo rd.neednet >> /etc/cmdline.d/35-neednet.conf modules.d/40network/net-lib.sh: echo "ifname=$name$num:$mac" >> /etc/cmdline.d/45-ifname.conf modules.d/40network/net-lib.sh: ) >> /etc/cmdline.d/40-ibft.conf modules.d/80cms/cmsifup.sh:} > /etc/cmdline.d/80-cms.conf modules.d/95fcoe-uefi/parse-uefifcoe.sh: print_fcoe_uefi_conf "$i" > /etc/cmdline.d/40-fcoe-uefi.conf && break modules.d/95nvmf/parse-nvmf-boot-connections.sh: echo "rd.neednet=1" > /etc/cmdline.d/nvmf-neednet.conf modules.d/98dracut-systemd/dracut-cmdline-ask.sh: [ -n "$line" ] && printf -- "%s\n" "$line" >> /etc/cmdline.d/99-cmdline-ask.conf modules.d/99base/init.sh: echo "$line" >> /etc/cmdline.d/99-cmdline-ask.conf
Upstream fix https://github.com/dracutdevs/dracut/pull/2233 . If someone could help to confirm the upstream fix, that would be appreciated. In addition it seems installer already sets rd.live.overlay.overlayfs in https://github.com/livecd-tools/livecd-tools/blob/main/imgcreate/live.py#L127 in certain conditions. Why only on certain conditions ?
I don't know the reason for the conditional there off the top of my head, but that code is used in building live images, which aren't affected by the bug. Installer images (that is, the network installer, Server DVD installer, and Silverblue DVD installer - images that boot to a dedicated installer environment, not to some kind of live desktop) are the ones affected by this bug. Those are built by lorax - https://github.com/weldr/lorax/ - which doesn't use the imgcreate library. (FWIW, our current official lives don't seem to have rd.live.overlay.overlayfs on the cmdline either, they have rd.live.image . I think this may be changing as part of https://fedoraproject.org/wiki/Changes/ModernizeLiveMedia , but we had to revert the persistent overlay part of that for now as the initial attempt broke stuff. From a quick look at the boot logs, it doesn't look like the current official lives use overlayfs at all).
Oh, and I'll test the proposed fix today, thanks.
Tested, it seems to work.
(In reply to Adam Williamson from comment #9) > I don't know the reason for the conditional there off the top of my head, > but that code is used in building live images, which aren't affected by the > bug. Installer images (that is, the network installer, Server DVD installer, > and Silverblue DVD installer - images that boot to a dedicated installer > environment, not to some kind of live desktop) are the ones affected by this > bug. Those are built by lorax - https://github.com/weldr/lorax/ - which > doesn't use the imgcreate library. > > (FWIW, our current official lives don't seem to have > rd.live.overlay.overlayfs on the cmdline either, they have rd.live.image . I > think this may be changing as part of > https://fedoraproject.org/wiki/Changes/ModernizeLiveMedia , but we had to > revert the persistent overlay part of that for now as the initial attempt > broke stuff. From a quick look at the boot logs, it doesn't look like the > current official lives use overlayfs at all). There's new module (triggered by dmsquash-live): https://github.com/dracutdevs/dracut/tree/master/modules.d/90dmsquash-live-autooverlay Also, I've reverted the change in https://src.fedoraproject.org/rpms/dracut/pull-request/30 (F39 only).
@pvalena given the upstream discussion, would you be open to revert https://src.fedoraproject.org/rpms/dracut/c/05988c6a16621c75d2fe3ed0cfddfb6ce2d18f93?branch=rawhide and pull in https://github.com/dracutdevs/dracut/commit/0e780720efe6488c4e07af39926575ee12f40339 . I hope to understand if there is any remaining issue and I hope to see Fedora releasing the new overlayfs dracut module to match other distro's. Thanks !
We can possibly do that for F38 *after* Beta is released. Hard to justify doing it during Beta freeze when we know the reversion is working fine.
> We can possibly do that for F38 *after* Beta is released. That would be great, thanks ! Reversion reintroduced a bug where overlay does not work with NFS (which also breaks the test suite that ships with dracut). Other distro's shipping with this version of dracut does not have this bug. Not trying to pressure anybody to make anything happen, but this is a trade-off that is being made.
I can push the fix to rawhide (and drop the revert), and later to even to F38, if that works. But I'm still unsure of F37, as I wanted to push the updated 059 there also (possibly having the patch). Depends on which if those is more reliable :).
FYI, I've not done the revert-fix-build for F38, and the first buld was simply untagged: https://koji.fedoraproject.org/koji/buildinfo?buildID=2156534
Can anyone test the functionality specifically? I do not know how to create my own boot-media. But might learn to.... https://src.fedoraproject.org/rpms/dracut/pull-request/32 - Scratch-builds: (copr) https://copr.fedorainfracloud.org/coprs/build/5618842 (rawhide) https://koji.fedoraproject.org/koji/taskinfo?taskID=98491273 (f38) https://koji.fedoraproject.org/koji/taskinfo?taskID=98494231 (f37) https://koji.fedoraproject.org/koji/taskinfo?taskID=98491269
FYI, discussion about live-media: https://matrix.to/#/!mXoNEgzrLsrhDoJncn:gitter.im/$t3DasIGetl3wC6NDDJ1Fu8JNB7av35wGTDTa_9MoDkU?via=gitter.im&via=matrix.org&via=fedora.im
Anything you submit as an update will automatically be tested by openQA, and for anything besides Rawhide, if the test fails the update will be blocked from being pushed. That's how I found this bug in the first place. There is an openQA test that creates an installer ISO and checks if it works. If you want a test before submitting an official update, just do a scratch build and ask me; I can manually trigger tests on scratch builds.
Pull-request: https://src.fedoraproject.org/rpms/dracut/pull-request/34#
FEDORA-2023-e8ca690ff3 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-e8ca690ff3
I donwloaded the last netinst-x86_64-38-20230325.n.0.iso image, booted and installed successfully a VM with this image. I assume it have the last dracut build? Is this a valid test for this bug? Or must I test a Server DVD installer?
oops, this image uses dracut-057-6.fc38. Sorry for the noise, I'll wait for a new iso with dracut-059-2.fc38
FEDORA-2023-e8ca690ff3 has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.
Tested the new Fedora-Server-netinst-x86_64-38-20230326.n.1 It have dracut-059-2.fc38 and it boots and installs correctly.