Bug 1046510

Summary: installing 4th kernel breaks the rescue kernel
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: dracutAssignee: dracut-maint-list
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 20CC: awilliam, covex, dracut-maint-list, gansalmon, harald, itamar, jonathan, kernel-maint, madhu.chinakonda, mrmazda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: dracut-037-10.git20140402.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-06 02:37:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
yum debug output
rd.debug output from virsh console
virsh console output none

Description Chris Murphy 2013-12-25 22:47:40 UTC
Description of problem: Upon installing a 4th kernel to an F20 system, the rescue kernel no longer results in a successful startup. 

Version-Release number of selected component (if applicable):

How reproducible:
Always so far.

Steps to Reproduce:
1. F20 normal install to Mac EFI, thus /boot/efi is hfsplus based. The installed kernel is 3.11.10-301.fc20.x86_64, and the rescue kernel is based on that too. Both options work.
2. Install 3.12.5-301. Everything still works.
3. yum install 3.13.0-0.rc4.git0.1.fc21 from koji, everything works.
4. yum install 3.13.0-0.rc5.git0.1.fc21 from koji, yum reports it's erasing kernel 3.11.10-301.fc20.x86_64.
5. Choose the rescue kernel for boot.

Actual results:

System drops to emergency mode because /boot/efi failed to mount because mount reports "unknown file system type 'hfsplus'"

uname -r reports the kernel is 3.11.10-301.fc20.x86_64

And /lib/modules/3.11.10-301.fc20.x86_64 is missing, presumably deleted when yum removed kernel 3.11.10-301.

Expected results:
I actually don't know the expected behavior of the rescue kernel, whether it will be 3.11.10 based for the life of Fedora 20, and therefore /lib/modules/3.11.10-301.fc20.x86_64 needs to be preserved; or if a new rescue kernel is supposed to be generated based on the oldest remaining kernel which in this case would be 3.12.5-301.fc20.

I expect the rescue kernel option to enable a successful startup of the system.

Additional info:
Maybe this is a problem between yum 3.4.3-128.fc20 and f21 kernel rpms?

Comment 1 Chris Murphy 2013-12-26 01:02:30 UTC
Reproducible in qemu-kvm VM as follows:
1. Fedora 20 minimal package set install, guided partitioning, partition scheme BTRFS.
2. Reboot and download from koji:
3. yum update kernel-3.12.5-300.fc20.x86_64.rpm and reboot
4. yum update kernel-3.12.5-301.fc20.x86_64.rpm and reboot
5. yum update kernel-3.12.5-302.fc20.x86_64.rpm

The 302 kernel update causes kernel 3.11.10-301.fc20.x86_64 to be removed, which includes /lib/modules/3.11.10-301.fc20.x86_64. The rescue kernel is the same name as before this update vmlinuz-0-rescue-479c1e343b00425c9eb721ef5b7e7890 and when booted it is the 3.11.10-301.fc20.x86_64 kernel.

Presumably this isn't the intended behavior, something else is supposed to happen.

Comment 2 Chris Murphy 2013-12-26 01:26:42 UTC
Created attachment 841684 [details]
yum debug output

yum -v --debuglevel=10 --rpmverbosity=debug update kernel-3.12.5-302.fc20.x86_64.rpm

Removes 3.11.10, does not replace the rescue kernel.

Comment 3 Josh Boyer 2013-12-26 12:45:49 UTC
Moving to dracut.

The kernel package has no idea what a "rescue" kernel is.  That is something entirely created by dracut or anaconda on install.  If it is creating something using the kernel modules without the kernel package being aware of it, then it needs to copy the modules to an appropriate directory or somehow otherwise ensure it isn't broken when the base kernel is removed.

I would have thought that the rescue initramfs contains all the modules (or a significant portion of them) because if you're rescuing your machine then you cannot rely on anything in / being in a sane state.

Comment 4 Adam Williamson 2013-12-27 05:05:43 UTC
"I would have thought that the rescue initramfs contains all the modules (or a significant portion of them) because if you're rescuing your machine then you cannot rely on anything in / being in a sane state."

Yes, that is basically what the 'rescue kernel' is: it's more an 'rescue initramfs', and it's really just a generic initramfs. The 'rescue kernel' is a bootloader entry that boots a kernel with a generic initramfs, when you get down to it. (I think it also pulls in a 'rescue' dracut module, the purpose of which I'm not sure of.)

Comment 5 Chris Murphy 2013-12-27 06:29:28 UTC
[root@f20s boot]# lsinitrd initramfs-0-rescue-3ac0117108d1432fa20e376f37facca9.img | grep -i hfsplus
drwxr-xr-x   1 root     root            0 Dec 16 20:37 usr/lib/modules/3.11.10-301.fc20.x86_64/kernel/fs/hfsplus
-rw-r--r--   1 root     root       146287 Dec  5 07:17 usr/lib/modules/3.11.10-301.fc20.x86_64/kernel/fs/hfsplus/hfsplus.ko
lrwxrwxrwx   1 root     root           12 Dec 16 20:37 usr/sbin/fsck.hfs -> fsck.hfsplus
-rwxr-xr-x   1 root     root       403056 Dec 16 20:37 usr/sbin/fsck.hfsplus

hfsplus.ko is in the initramfs, and yet mount still thinks it's an unknown file system type?

Comment 6 Chris Murphy 2013-12-27 06:37:54 UTC
Created attachment 842203 [details]
rd.debug output from virsh console

Comment 7 Chris Murphy 2013-12-27 07:04:37 UTC
Created attachment 842210 [details]
virsh console output

Retry without rd.break, and with rd.debug systemd.log_level=debug Not particularly revealing why it's failing though.

# systemctl -l status mnt-hfs.mount
Accepted connection on private bus.
Got D-Bus request: org.freedesktop.DBus.Properties.GetAll() on /org/freedesktop/systemd1/unit/mnt_2dhfs_2emount
SELinux access check scon=unconfined_u:unconfined_r:unconfined_t:s0 tcon=system_u:object_r:etc_t:s0 tclass=service perm=status path=/etc/fstab cmdline=(null): 0
Looking for unit files in (higher priority first):
Looking for SysV init scripts in:
Looking for SysV rcN.d links in:
mnt-hfs.mount - /mnt/hfs
   Loaded: loaded (/etc/fstab)
   Active: failed (Result: exit-code) since Fri 2013-12-27 01:56:30 EST; 5min ago
    Where: /mnt/hfs
     What: /dev/disk/by-uuid/19794061-d2f8-362b-92fb-c2543f99b3cd
  Process: 277 ExecMount=/bin/mount /dev/disk/by-uuid/19794061-d2f8-362b-92fb-c2543f99b3cd /mnt/hfs -t hfsplus (code=exited, status=32)

Dec 27 01:56:29 localhost.localdomain mount[277]: Executing: /bin/mount /dev/disk/by-uuid/19794061-d2f8-362b-92fb-c2543f99b3cd /mnt/hfs -t hfsplus
Dec 27 01:56:30 localhost.localdomain mount[277]: mount: unknown filesystem type 'hfsplus'
Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local

Comment 8 Harald Hoyer 2014-01-07 13:19:26 UTC
Well, although the kernel module for the filesystem is in the initramfs, it is not loaded in the initramfs, because it is not needed to mount the root filesystem. And because the kernel was already removed, you cannot load the module in the real root.

There are several solutions for this problem:

1. load all kernel drivers in the rescue initramfs
2. somehow "bind-mount" the kernel drivers from the initramfs in the real root
3. specify "nofail" for this non-critical mount point
4. boot with "rd.break" on the kernel command line, then "modprobe hfsplus", then "exit"

Comment 9 Chris Murphy 2014-01-07 17:15:31 UTC
(In reply to Harald Hoyer from comment #8)
Why is /boot/efi (the EFI System partition) considered non-critical? If the system is being rescued it's entirely possible it's needed to effect repairs on the system. If I need to reinstall a kernel in this environment, /boot/efi is critical because presently it (wrongly) contains the grub.cfg which needs to be updated for the newly added kernel.

Comment 10 Fedora Update System 2014-04-02 08:57:04 UTC
dracut-037-10.git20140402.fc20 has been submitted as an update for Fedora 20.

Comment 11 Fedora Update System 2014-04-03 04:03:28 UTC
Package dracut-037-10.git20140402.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dracut-037-10.git20140402.fc20'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).

Comment 12 Fedora Update System 2014-04-06 02:37:16 UTC
dracut-037-10.git20140402.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.