Description of problem: After upgrading a system from F17 to F18 when a kernel upgrade is involved, the target system cannot boot post upgrade and dies with a kernel panic. VFS: Cannot open root device "mapper/vg_f17upgrade1-lv_root" or unknown-block(0,0): error -6 Please append a correct "root=" boot option; here are the available partitions: Kernel panic- not syncing: VFS: Unable to mount foot fs un unknown-block(0,0) Pid: 1, comm: swapper/0 not tainted 3.6.5-2.fc18.x86_64 #1 Call Trace: [<ffffffff8161753f>] panic+0xc1/0x1d0 [<ffffffff81cfd0b0>] mount_block_root+0x1d6/0x287 [<ffffffff81cfd1b8>] mount_root+0x57/0x5b [<ffffffff81cfd2f9>] prepare_namespace+0x13d/0x176 [<ffffffff81cfce16>] kernel_init+0x1cf/0x1d4 [<ffffffff81cfc614>] ? do_early_param+0x8c/0x8c [<ffffffff8162a7c4>] kernel_thread_helper+0x4/0x3d4 [<ffffffff81cfcc47>] ? start_kernel+0x3d4/0x3d4 [<ffffffff8162a7c0>] ? gs_change+0x13/0x13 From farther triage, it appears as if the initramfs on the upgraded machine does exist but is not complete. The kernel also may not be complete and may require installation (haven't checked that on all of the failed upgrades) Version-Release number of selected component (if applicable): Packaged or latest version from git How reproducible: I have seen the same symptoms on every upgrade that I've done with fedup when a kernel upgrade is involved. Steps to Reproduce: 1. use fedup-cli to setup the upgrade 2. run the upgrade after reboot 3. the upgraded system is not able to boot after restart Workaround using the installer's rescue mode: 1. chroot into the installed system - chroot /mnt/sysimage 2. rebuild the initramfs in /boot - cd /boot - dracut initramfs<kernel_version>.img <kernel_version> 3. reinstall the kernel - yum reinstall kernel
Created attachment 638951 [details] console output from failed upgrade and reboot I did an upgrade with the following kernel command line and attached the output from serial console: BOOT_IMAGE=/upgrade/vmlinuz root=/dev/mapper/vg_f17upgrade1-lv_root ro rd.lvm.lv=vg_f17upgr ade1/lv_root rd.md=0 rd.dm=0 SYSFONT=True rd.lvm.lv=vg_f17upgrade1/lv_swap KEYTABLE=us rd.luks=0 LANG=en_US.UTF-8 rhgb s ystemd.log_level=debug systemd.journald.forward_to_console=1 systemd.log_target=console systemd.unit=system-upgrade.targ et rd.debug rd.upgrade.debugshell console=tty0 console=ttyS0,115200n8
My guess is that this is the result of something causing the system to reboot forcibly before it unmounts & syncs the root/boot partitions. This would also explain why the log files created post-upgrade (/var/log/upgrade.{log,journal}) are also present but not complete. There's no clear reason yet why this would happen, but one interesting part of the log is at the end: Unmounted /dev/mqueue. [ 1865.799488] Unregister pv shared memory for cpu 0 systemd/src/core/shutdown.c says there should be another log message in the middle: "All filesystems, swaps, loop devices, DM devices detached." but, strangely, this never appears.
Proposing as a blocker for F18 beta due to violation of the following F18 beta release criterion [1]: For each one of the release-blocking package sets ('minimal', and the package sets for each one of the release-blocking desktops), it must be possible to successfully complete an upgrade from a fully updated installation of the previous stable Fedora release with that package set installed, using any officially recommended upgrade mechanisms. The upgraded system must meet all release criteria. [1] http://fedoraproject.org/wiki/Fedora_18_Beta_Release_Criteria
CC'ing systemd guys, maybe they will have a clue what's happening there.
+1 blocker, seems pretty clearly blockery.
Discussed 2012-11-07 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-07/f18beta-blocker-review-7.2012-11-07-17.03.log.txt . Accepted as a blocker per criterion cited in comment #3.
My hunch from comment #2 is almost certainly right. Here's the deal: To set up the environment, we bind-mount everything from the system into /system-upgrade-root/sysroot and then switch-root into /system-upgrade-root. When the switch-root happens, all the mounts get recursively marked as private by systemd (because you can't move a mount under a shared mount): if (mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL) < 0) ... So now /system-upgrade-root/sysroot/boot is no longer shared with /boot - meaning that unmounting /system-upgrade-root/sysroot/boot *does not unmount /boot*. This would explain how the system can shut down and claim it "Unmounted /boot" but the data doesn't get fully written - it hasn't actually been unmounted. The real /boot is actually still mounted, just.. somewhere in a lost namespace. So yes. We're looking into workarounds.
*** Bug 875689 has been marked as a duplicate of this bug. ***
As a short-term workaround, I *think* we can do something like this: # bind everything into the upgrade chroot mount --rbind / $UPGRADEROOT/sysroot || die "couldn't bind / into upgrade dir" # make the bind mounts separate from the original mounts mount --make-rprivate / # unmount the original mounts, i.e.: # anything that's a block device, not root, and not under UPGRADEROOT tac /proc/mounts | while read dev mnt type opts x y; do if [ -b "$dev" -a "$mnt" != "/" -a "${mnt#$UPGRADEROOT}" == "$mnt" ]; umount $mnt && echo "moved $mnt" || echo "failed to move $mnt" fi done which should leave us with exactly *one* copy of each mounted disk device during the upgrade. Haven't tested this yet but we'll report back once we figure out if it works. Longer-term, the smart thing to do might be to make fedup set up the target system inside a chroot/nspawn container.. unfortunately, systemd refuses to run in chroot and nspawn isn't allowed to create device nodes, so it can't mount disks (normally). You can work around the nspawn device node problem by bind-mounting the real /dev inside the container, but the mounts that get set up inside the container are not visible outside it and/or aren't retained after the container exits. We'll figure something out, though.
I did quite a bit of testing yesterday with a patch from Will similar to the one described in comment#9. The change seems to be working - upgraded systems do reboot without a kernel panic when using that patch. I've done one test with the current fedup code in git - it also seems to be working once the .treeinfo in the test repository is changed to adhere to the new reqirements. The changes to force upgrade logs to sync to disk were still not in git the last time I checked. Upgrades work without them but it would be nice to have logs from the whole process.
(In reply to comment #7) > My hunch from comment #2 is almost certainly right. Here's the deal: > > To set up the environment, we bind-mount everything from the system into > /system-upgrade-root/sysroot and then switch-root into /system-upgrade-root. > > When the switch-root happens, all the mounts get recursively marked as > private by systemd (because you can't move a mount under a shared mount): > > if (mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL) < 0) ... > This isn#t really relevant here. All processes run in the same fs namespace, so if one of them sees a mount point all of them do.
So - there's a couple of filthy workarounds in git: https://github.com/wgwoods/fedup/commit/f6f1c68 https://github.com/wgwoods/fedup-dracut/commit/3da014f But after some debugging with Lennart we learned some interesting things: - The disks that aren't getting umounted are missing from /proc/1/mountinfo - The disks *are* present in /proc/1/mounts - reboot() doesn't seem to call sync() (although we'd been told it did) - `mount -o remount,ro $MNT` should sync the disk as well as umount() So I think we may have a more correct, more permanent, less horrifying fix in these systemd patches: http://cgit.freedesktop.org/systemd/systemd/commit/?id=0049f05 http://cgit.freedesktop.org/systemd/systemd/commit/?id=93bd157 http://cgit.freedesktop.org/systemd/systemd/commit/?id=891a491 The last patch (which does pivot_root()) doesn't seem to pivot the old root properly, but in my brief testing with these patches, all data gets written to disk before reboot.
if the systemd fix is more permanent and less horrifying, maybe we should use that even for beta, so long as it's not going to stuff up anything else?
Marking POST to reflect that we have potential fixes available.
Those three systemd patches should be backported to F17 and F18. Might make sense to reassign this to systemd now, unless there's also something in fedup still to fix here.
(In reply to comment #15) > Those three systemd patches should be backported to F17 and F18. Might make > sense to reassign this to systemd now, unless there's also something in > fedup still to fix here. Do you know when we might see systemd builds with those patches? The last build I'm seeing on F18 is 195-7 which appears to have been built before the patches mentioned in c#12 were pushed to git and I'm not seeing any activity in pkggit in the last 10 days.
Lennart: just in case you didn't know, we're back to trying to release Beta now, go/no-go is Thursday, and we're basically just stuck on fedup - so the timeframe here would be that we would really like a systemd build with all necessary stuff to make fedup work by tomorrow, if you can. thanks!
fedup-0.7.1-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/fedup-0.7.1-1.fc18
Package fedup-0.7.1-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing fedup-0.7.1-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-18606/fedup-0.7.1-1.fc18 then log in and leave karma (feedback).
Well, does it mean we need a) fedup* workarounds or b) systemd build with the patches or both a) and b) are needed?
I'll prep updates tonight for systemd in F17 and F18 with the three patches.
(In reply to comment #21) > I'll prep updates tonight for systemd in F17 and F18 with the three patches. Thanks!
systemd-195-8.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/systemd-195-8.fc18
F18 upload done, please give karma. Next: F17.
hmm, I think F17 doesn't actually need any updating as it doesn't support switch root in systemd yet, and the explicit sync() was in place anyway still.
fedup-0.7.1-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/fedup-0.7.1-1.fc17
(In reply to comment #25) > hmm, I think F17 doesn't actually need any updating as it doesn't support > switch root in systemd yet Right - also, it's the F18 systemd (inside upgrade.img) that's doing the umount, so these fixes aren't necessary in F17.. at least, not for *this* bug.
In my testing, at least, the systemd fixes are sufficient. The awful force-sync hacks can be removed and everything still gets written to disk just fine. The force-sync hacks are still in fedup-dracut-0.7.1, however. If we want to remove them now and verify that fact, I'll do another build, otherwise please continue testing stuff as-is.
I have done two upgrades today with fedup-0.7.1-1.fc17 and both rebooted automatically just fine (kernel was updated to .fc18).
I'm not sure my testing verifies all the issues mentioned here, but if it does, please move to VERIFIED.
Moving to VERIFIED as agreed at go/no-go meeting.
systemd-195-8.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
fedup-0.7.1-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
fedup-0.7.1-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.