Red Hat Bugzilla – Bug 847418
MS_SHARED breaks pivot_root(), causes booting trouble in switch-root and shutdown
Last modified: 2013-05-29 10:53:52 EDT
Description of problem:
After an update to systemd-188-2.fc18 an attempt to boot using initramfs generated with this systemd version results in the following on my screen:
dracut-initqueue: Mounted root filesystem /dev/sdc11
dracut-pre-pivot: Checking ext3: /dev/disk/by-label/\x2fusr1
dracut-pre-pivot: issuing e2fsck -a /dev/disk/by-label/\x2fusr1
dracut-pre-pivot: e2fsck: Cannot continue, aborting.
dracut-pre-pivot: Warning: e2fsck returned with 8
dracut-pre-pivot: Warning: /dev/disk/by-label/x2fusr1 is mounted.
dracut-pre-pivot: Warning: *** An error occurred during the file system check.
dracut-pre-pivot: Warning: *** Dropping you to a shell; the system will try
dracut-pre-pivot: Warning: *** to mount the filesystem(s), when you leave the shell.
In this moment this drops me into "Repair" and I am stuck. Does not matter if I will mount or unmount some file systems, run fsck or whatever after an exit from this shell I am ending up with the next series of messages like the above and we can play that game again. There is no apparent way around that bogosity.
Version-Release number of selected component (if applicable):
systemd not preventing boot
Downgrading to systemd-187-3.fc18 and using that to produce initramfs gets me back into bug 840242 territory but at least I can get to a shell prompts
Created attachment 603655 [details]
dmesg.3.6.0-0.rc1.git3.2.fc18.x86_64 with a systemd debugging information from a failed boot
Created attachment 603656 [details]
dmesg for the same kernel as above but with intramfs using systemd-187-3.fc18
Added for comparison purposes. Using that requires a manual intervention in a boot process to mount "forgotten" disks, as described in bug 840242, but at least does not block me entirely.
*** Bug 847477 has been marked as a duplicate of this bug. ***
Hmm, so my guess is that this is actually a kernel bug triggered by the fact that we now remount evertyhing MS_SHARED very early on. The ref counting of the fs in the kernel is broken which results in pivot_root() breaking.
We can work around this I guess by remounting things MS_PRIVATE right before switching root. But this probably should be fixed in the kernel instead.
A work-around for this issue is to boot without initrd. In grub you can edit the commands for your boot, drop the initrd line and replace the root=UUID... line with root=/dev/sda6 (or wherever your root fs is located; if you are on LVM you are fucked, use an older initrd image/kernel)
I have now added a work-around to git upstream, and backported it to F18:
Will reassign this to the kernel now, since I am quite sure there's something wrong with the fs ref-counting and mount semantics.
Kernel folks: file systems marked MS_SHARED cannot be moved with mount(), MS_MOVE fail with EINVAL.
To clarify that, neither pivot_root() nor MS_MOVE works are compatible with MS_SHARED.
Hmm, judging by do_move_mount() in namespace.c this actually appears to be intended behaviour of the kernel. But I do wonder why.
Hmm, if this is supposed to stay that way then we probably should file bugs against util-linux too, so that the switch-root and pivot-root utils remount things MS_PRIVATE before moving things, too.
(In reply to comment #5)
> I have now added a work-around to git upstream, and backported it to F18:
> Will reassign this to the kernel now, since I am quite sure there's
> something wrong with the fs ref-counting and mount semantics.
> Kernel folks: file systems marked MS_SHARED cannot be moved with mount(),
> MS_MOVE fail with EINVAL.
(In reply to comment #8)
> build failed
Yes, sorry for the confusion, a later build did work:
(In reply to comment #7)
> Hmm, judging by do_move_mount() in namespace.c this actually appears to be
> intended behaviour of the kernel. But I do wonder why.
> Hmm, if this is supposed to stay that way then we probably should file bugs
> against util-linux too, so that the switch-root and pivot-root utils remount
> things MS_PRIVATE before moving things, too.
As far as I can see, the code in question has been in place since 2005 so this isn't new. Documentation/filesystems/sharedsubtree.txt has a brief blurb that says:
"NOTE: moving a mount residing under a shared mount is invalid."
I've added Al to CC to see if he has any insight here, but I don't think this is a kernel bug at the moment. Just intended behavior.
systemd-188-3.fc18 is at least as broken as systemd-187-3.fc18 but at least it does not drop me into "Repair" and can be coaxed to boot in some sense.
Wth a combination of
we are back to a square one, i.e. boot misbehaves in an exactly the same way as described in the original report.
After dropping back to systemd-188-3.fc18 (fc18 and NOT fc19) and redoing initramfs I can boot once again - well, modulo an outstanding bug 840242.
Yep. The master branch never got the patch... so 188-3 is different in f18/f19.
Also, 189 builds for f18 here are also broken, so I guess the patch never went upstream for the 189 release?
It would be nice if we converged f18 and master branches and built both moving forward.
*** Bug 854611 has been marked as a duplicate of this bug. ***
Looking at Lennart's patches, both of them are applied in -189, so if the systemd-188-3.fc18 build worked, the -189.fc18 builds _should also_ work, unless something new in 189 causes a problem.
Great -- sounds like we just need for someone to build these packages for rawhide (f19).
Normally, I'd just push 189 into rawhide, but the spec file has a multi-paragraph warning about why the systemd maintainers do not want me to do this, so I opted out. Paging Lennart to do it.
We don't build Rawhide packages separately. We want Rawhide to simply inherit from F18 as long as possible. Unfortunately somebody who updated the packages didn't know that and updated the package in Rawhide, so that inheriting was disabled from that point on. We then updated F18 a couple of time which never ended up in Rawhide.
I have now manually untagged the broken package from Rawhide so that we inherit from F18 again. I have also added the aforementioned message to the .spec file to ensure that other folks who update the package don't make the same mistakes.
Honestly I believe the entire git logic in Fedora is backwards. Instead of keeping master all the time around it should just fork off the next version from the previous one when necessary and just get rid entirely of master.
Anyway, this is all corrected now, as the package got untagged and people should get the proper version from F18 again. If you run rawhide then please make sure to downgrade to the latest systemd rpm from F18 again. Thanks.
Lennart regardless of your personal beliefs or preferences the work flow you should be using is to put all changes in rawhide first then merge down to f18 and lower as appropriate. Please do so.
Lennart your also not allowed to untag builds that have been pushed out, you need to build a higher nvr from master that fixes the issue. i have tagged the build back in. please do the right thing and do a fixed build in master.
(In reply to comment #18)
> We don't build Rawhide packages separately. We want Rawhide to simply
> inherit from F18 as long as possible.
What I do for several packages is to put all the changes into
master, and then merge those into f18.
With 'fedpkg clone -B' this is particularly easy:
(1) cd master
(2) add changes, commit, push
(3) cd ../f18
(4) git pull ../master # merges the changes into f18
(5) fedpkg push
Of course this only works so long as no specific patch has
to be pushed to f18 only. Once that happens we use git cherry-pick
instead of merging, but I try to delay that happening as long
Does this still need to be open? Hasn't it been cleaned up for weeks? Or is there still an underlying kernel bug?
There's underlying kernel behavior that has existed since shared mounts went into the kernel. It's been this way since for years. Comment #10 still applies.
I'm going to close this out as ERRATA and reassign it to systemd. If Lennart or others would like to see different behavior, I would suggest taking it to the upstream VFS developers.
*** Bug 789285 has been marked as a duplicate of this bug. ***