Bug 1232411 - Rawhide (23) boot.iso nightlies do not boot with dracut 042+
Summary: Rawhide (23) boot.iso nightlies do not boot with dracut 042+
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 23
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F23AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2015-06-16 17:03 UTC by Adam Williamson
Modified: 2015-07-24 23:51 UTC (History)
9 users (show)

Fixed In Version: anaconda-23.12-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-16 15:04:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Proposed fix for anaconda-dracut usage of dmsquash-live-root (1.16 KB, patch)
2015-06-18 21:05 UTC, Brian Lane
no flags Details | Diff
anaconda-dracut part of the fix (1.44 KB, patch)
2015-06-18 21:06 UTC, Brian Lane
no flags Details | Diff
patch to always mount /dev/mapper/live-rw (2.76 KB, patch)
2015-06-23 17:55 UTC, Brian Lane
no flags Details | Diff
Proposed patch for anaconda (585 bytes, patch)
2015-06-24 08:36 UTC, Harald Hoyer
no flags Details | Diff

Description Adam Williamson 2015-06-16 17:03:53 UTC
The 0615 and 0616 Rawhide nightly boot.iso images do not boot, due to some issue likely introduced in dracut 042. The symptom is that boot fails thus:

[FAILED] Failed to start Switch Root.
See 'systemctl status initrd-switch-root.service' for details.
Warning: /dev/root does not exist

Generating "/run/initramfs/rdsosreport.txt"

I haven't yet determined the underlying cause of the problem. It is not the same as https://bugzilla.redhat.com/show_bug.cgi?id=1229665 .

The 0616 x86_64 nightly can be found here: https://kojipkgs.fedoraproject.org/mash/rawhide-20150616/rawhide/x86_64/os/images/boot.iso . That link should work for at least 2-3 weeks.

Marking as an automatic Alpha blocker per https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Automatic_blockers :

"Complete failure of any release-blocking TC/RC image to boot at all under any circumstance - "DOA" image (conditional failure is not an automatic blocker)"

Comment 1 Adam Williamson 2015-06-16 18:00:04 UTC
OK, so this is still related to the dmsquash-live stuff.

It seems that the non-live installer images do actually rely on dracut's dmsquash-live. If you boot a boot.iso from before dracut-042 with rd.debug you see this:

/sbin/dmsquash-live-root@271(main): printf 'mount %s /dev/mapper/live-rw %s\n'' /sysroot

and there's a file /lib/dracut/hooks/mount/01-661-live.sh:

mount /dev/mapper/live-rw /sysroot

and indeed, /dev/mapper/live-rw is mounted as /sysroot.

That's the bit that dracut 042 disabled when systemd is active:

https://git.kernel.org/cgit/boot/dracut/dracut.git/commit/modules.d/90dmsquash-live/dmsquash-live-root.sh?id=8ff624df9f3f300a008711d114a8769464a054db

but the new generator does not cope with the way installer images are set up. As I read the generator:

https://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90dmsquash-live/dmsquash-generator.sh

it will only actually create the mount if the cmdline has something like 'root=live' or 'root=live:SOMETHINGOROTHER', but for installer images (and probably other scenarios), this is not the case. The cmdline for a boot.iso is:

BOOT_IMAGE=vmlinuz initrd=initrd.img inst.stage2=hd:LABEL=Fedora-rawhide-x86_64 quiet

There's a lot of detail to it, but basically, if you compare dmsquash-generator.sh and dmsquash-live-root.sh , the latter clearly handles a lot more cases than the former, so in dracut-042 - where the latter doesn't actually mount anything if systemd is in use - we break some cases.

Comment 2 Adam Williamson 2015-06-16 18:15:02 UTC
Aha. So I got to wondering how dmsquash-live-root got triggered at all for installer images, as dracut itself doesn't look like it would do it, and turns out there's an interaction with anaconda-dracut. anaconda-dracut does some of the root discovery / prep itself, then calls dmsquash-live-root:

https://github.com/rhinstaller/anaconda/blob/master/dracut/anaconda-diskroot#L51
https://github.com/rhinstaller/anaconda/blob/master/dracut/anaconda-lib.sh#L68
https://github.com/rhinstaller/anaconda/blob/master/dracut/anaconda-lib.sh#L100

but as noted in #c1, this is now broken because dmsquash-live-root doesn't actually *mount* the device any more when systemd is in use.

I don't know what the best solution here is, but at least I know what the problem is. CCing bcl as he seems to touch anaconda-dracut a lot.

Comment 3 Adam Williamson 2015-06-16 18:18:50 UTC
Possibly a dumb idea, but is it really necessary for the systemd generator to re-do all the 'check for "valid" cmdline parameter' stuff from parse-dmsquash-live.sh (see https://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90dmsquash-live/parse-dmsquash-live.sh )? Couldn't it be simplified to simply create the mount unit so long as /dev/mapper/live-rw exists? That avoids duplicating code and also makes it work in this case (where some other module is setting up /dev/mapper/live-rw in a way that wasn't expected...)

Comment 4 Adam Williamson 2015-06-16 18:31:14 UTC
If there's a race problem there (the generator may get run before /dev/mapper/live-rw is actually set up), it could always create the unit, but use a systemd unit condition:

ROOTFLAGS="$(getarg rootflags)"
{
    echo "[Unit]"
    echo "Before=initrd-root-fs.target"
    echo "ConditionPathExists=/dev/mapper/live-rw"
    echo "[Mount]"
    echo "Where=/sysroot"
    echo "What=/dev/mapper/live-rw"
    [ -n "$ROOTFLAGS" ] && echo "Options=${ROOTFLAGS}"
} > "$GENERATOR_DIR"/sysroot.mount

which would cause the mount to only be tried if /dev/mapper/live-rw existed. I may be missing something here, of course, it's just an initial idea.

Comment 5 Adam Williamson 2015-06-16 19:02:30 UTC
Ah, no, I see the problem with that. Now I understand why the logic is duplicated between parse-dmsquash-live.sh and dmsquash-generator.sh : it's because the generator is run by systemd, and we only want it to produce a sysroot.mount when we know the dmsquash-live stuff is actually going to kick in.

The reason is that when we produce sysroot.mount we are overriding systemd-fstab-generator . If we produce a sysroot.mount which doesn't do anything, we preclude systemd from trying to mount root with its own generator, and thus break all the cases which systemd's generator should handle (i.e. all the 'normal' root=something cases).

So it's basically correct that the dmsquash generator tries to only actually produce a mount file when it knows one is needed, and because the generator is run entirely outside of dracut itself, the logic more or less has to be duplicated. However, an unfortunate consequence is that it breaks this case, where another module is making use of the dmsquash-live module in a way dracut doesn't know about.

anaconda-dracut could of course ship its own generator (or just a simple static sysroot.mount file, I guess). Not sure if there's a better fix.

Comment 6 Brian Lane 2015-06-18 21:05:48 UTC
Created attachment 1040642 [details]
Proposed fix for anaconda-dracut usage of dmsquash-live-root

Comment 7 Brian Lane 2015-06-18 21:06:22 UTC
Created attachment 1040643 [details]
anaconda-dracut part of the fix

Comment 8 Brian Lane 2015-06-18 21:09:50 UTC
I've looked at this a bit today and the simple solution is to add an argument to dmsquash-live-root so that the old mount hook will be created when anaconda-dracut calls it.

I tried to find a more generic way for the dmsquash generator to run, possibly triggering the creation of the sysroot.mount based on /dev/mapper/live-rw but the timing isn't right (that path didn't exist when the generator is run).

There may be a cleaner solution to all of this, but for now these 2 patches should get things booting again.

I don't want to do anything like carry a sysroot.mount in anaconda-dracut, it really shouldn't have to know anything about that.

Comment 9 Adam Williamson 2015-06-22 16:29:18 UTC
Harald, can you please give your opinion on Brian's proposal and merge the dracut part if you agree with it? We need to get Rawhide boot.iso working again so we can test it properly.

Comment 10 Harald Hoyer 2015-06-23 13:45:32 UTC
Why can't the anaconda-dracut part install the mount hook?

Comment 11 Brian Lane 2015-06-23 16:21:06 UTC
We could -- but that makes this code more of a mess, not less. We'd then be carrying the hook code in 2 places. Ultimately we'd like to see less custom code in anaconda-dracut, not more of it.

Comment 12 Brian Lane 2015-06-23 17:55:25 UTC
Created attachment 1042468 [details]
patch to always mount /dev/mapper/live-rw

How about this? If /dev/mapper/live-rw has been created, no matter the method, it should be mounted on /sysroot

Comment 13 Adam Williamson 2015-06-23 22:23:29 UTC
I think the potential problem with that is that it might conflict with systemd's own systemd-fstab-generator - but I'll have to take a look at exactly how that works to know for sure.

Comment 14 Adam Williamson 2015-06-23 22:32:48 UTC
So yeah, the problem is we have two scenarios:

1) We want dracut (or anaconda-dracut) to mount /sysroot
2) We want systemd to mount /sysroot

Scenario 2) is handled by systemd's systemd-fstab-generator , the source for which is http://cgit.freedesktop.org/systemd/systemd/tree/src/fstab-generator/fstab-generator.c . So far as /sysroot goes it basically looks for a 'root=' parameter on the cmdline and if it finds one, generates a mount unit called 'sysroot.mount'. If a unit called sysroot.mount already exists it will bail out and not do anything.

The current dracut generator also creates a file called sysroot.mount - so when dracut's generator kicks in, systemd's generator will not. This means that dracut's generator should *only* kick in if it's actually going to successfully mount /sysroot (which is how it's currently set up; dracut's generator won't create a sysroot.mount unless it's actually going to be used). The problem with #c12 (which was my initial idea as well) is that it will break normal boots, where systemd should mount /sysroot based on the 'root=' cmdline parameter - the sysroot.mount which is now *unconditionally created* by the dracut generator will cause systemd's own generator to be effectively disabled, even when dracut's mount doesn't actually *do* anything because the ConditionPathExists is not satisfied.

Obviously we could rename dracut's mount unit to anything other than sysroot.mount, and then we wouldn't have that problem. But then we'd have a different problem: we'd have *two* competing /sysroot mount units. I don't know what systemd's behaviour would be in that case, but it doesn't sound like a good idea. It might work OK in 'normal' boot cases because the dracut mount wouldn't do anything, but what happens when we're actually going to use the dracut mount? Both it and the systemd one try to kick in? What happens then?

Comment 15 Adam Williamson 2015-06-23 22:35:56 UTC
I suppose one thing we could do is send a patch for *systemd*'s generator that makes it mount /dev/mapper/live-rw as /sysroot if it exists. That seems like it'd solve things somewhat elegantly, but it might be a bit too 'special sauce'.

Comment 16 Adam Williamson 2015-06-24 01:52:36 UTC
sigh, no, that doesn't work because setting a root= on the cmdline bypasses the anaconda-dracut bits. Grr.

Comment 17 Adam Williamson 2015-06-24 02:03:21 UTC
Sorry, I missed a bit there. I tried just adding 'root=/dev/mapper/live-rw' on the cmdline of a boot.iso to see if that'd be enough to make things work, but it isn't, because it causes the anaconda-dracut bits not to run.

Comment 18 Harald Hoyer 2015-06-24 08:36:49 UTC
Created attachment 1042645 [details]
Proposed patch for anaconda

Here is my proposed patch for anaconda. No dracut patch needed.

Comment 19 Brian Lane 2015-06-24 15:31:22 UTC
(In reply to Harald Hoyer from comment #18)
> Created attachment 1042645 [details]
> Proposed patch for anaconda
> 
> Here is my proposed patch for anaconda. No dracut patch needed.

Right, and if we can't come up with something better I guess I'll go with that. The problem I have is that now we've got 3 layers of stuff deciding how to mount root and handle the cmdline args. We're making this MORE complex instead of less and it's going to be harder to maintain in the future.

Comment 20 Adam Williamson 2015-06-24 15:45:58 UTC
if we go with that it'd probably make sense to at least stick in a comment briefly explaining what's going on and maybe referencing this bug.

Comment 21 Harald Hoyer 2015-06-24 17:25:25 UTC
(In reply to Brian Lane from comment #19)
> (In reply to Harald Hoyer from comment #18)
> > Created attachment 1042645 [details]
> > Proposed patch for anaconda
> > 
> > Here is my proposed patch for anaconda. No dracut patch needed.
> 
> Right, and if we can't come up with something better I guess I'll go with
> that. The problem I have is that now we've got 3 layers of stuff deciding
> how to mount root and handle the cmdline args. We're making this MORE
> complex instead of less and it's going to be harder to maintain in the
> future.

Well, isn't this a very specialized case of default handling, if "root=" is not present on the kernel command line, which normally leads to kernel panic?

Comment 22 Brian Lane 2015-06-26 17:24:42 UTC
I've fixed this in anaconda-dracut for now.

Comment 23 satellitgo 2015-06-28 04:01:21 UTC
slightly off topic:
bfo.iso [1] works for installs of rawhide:
using this source: 
https://kojipkgs.fedoraproject.org/mash/rawhide/x86_64/os/

[1] http://dl.fedoraproject.org/pub/alt/bfo/bfo.iso

I installed f23 cinnamon to VirtualBox with it.
References:
 http://wiki.sugarlabs.org/go/Fedora_23#Alternate_to_netinstall_.28boot.iso.29

Comment 24 Jan Kurik 2015-07-15 13:59:29 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 25 Adam Williamson 2015-07-16 15:04:11 UTC
This is fixed.

Comment 26 Bruce Jerrick 2015-07-24 23:47:30 UTC
Hey, I have an idea:  Instead of dealing with the complexities of
dracut, we could take just those parts needed to boot out of /usr 
and put them into a small root filesystem,  with, say, something like
/bin and /lib .  And instead of systemd, we could just use some shell
scripts, in something like, say, /etc/rc.d/init.d/ .  It might take
a little work initially to decide what has to go into /bin and /lib,
and it might takes 5 or 10 seconds longer to boot, but just think
how much simpler it would be!

Comment 27 Adam Williamson 2015-07-24 23:51:02 UTC
Bugzilla is for fixing bugs, not for trolling. And no, this stuff wasn't at all simple before dracut and systemd either.


Note You need to log in before you can comment on or make changes to this bug.