Bug 471729

Summary: System won't boot anymore after upgrade to 2.6.27-5
Product: [Fedora] Fedora Reporter: Alwin <ral>
Component: mkinitrdAssignee: Hans de Goede <hdegoede>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: medium    
Version: 9CC: bertrand.benoit, dcantrell, hdegoede, katzj, kernel-maint, pjones, quintela, shess01, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-03 10:47:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fixed mkinitrd script for F-10 users
none
Fixed mkinitrd script for F-9 users
none
Fixed mkinitrd script for F-10 users none

Description Alwin 2008-11-15 08:53:17 UTC
Description of problem:
After upgrade to this kernel last night the system won't boot anymore 'cause it isn't able to access /dev/dm-1. It just prints:

unable to access resume device (/dev/dm-1)
mount: error mounting /dev/root on /sysroot as ext3: no such file or directory

and something more like that.

Version-Release number of selected component (if applicable):
2.6.27-5

How reproducible:
On every boot


Additional info:
The initrd created by update process is a lot smaller than them from other kernels. I tried it with mkinitrd finding required modules but no luck. And why resume device? I just rebooted, not hybernate. And root is /dev/dm-0.

Second: Kernels before had root=UID=xxxxxx as parameter. this upgrade changed it to root=/dev/dm-0. Why? (I tried it with this but didn't work) 

This update is complete broken. And it is meanwhile the third time in fedora 9 I've got such a broken kernel update. the system where upgraded from fedora 8 with pre-upgrade.

Any hints until you'll fix this?

Comment 1 Alwin 2008-11-15 22:22:59 UTC
Additional:

/dev/dm-1 is the swap partition

Comment 2 Alwin 2008-11-19 08:50:12 UTC
Hi,

after a while I understood whats going wrong:

As i said it were on updated systems (fedora 8 -> fedora 9 with pre-upgrade)

Fedora 8 installer had mounted volumes with UID=xxx-xxx and gaved it same way as parameter to kernel in grub.

Fedora 9 fresh installs does it like earlier: mount it like 

/dev/VolGroup00/LogVol00

and insert it this way into grub.conf.

When now an update for kernels arrives I think the mkinitrd isn't able resolve that UID=xxx-xxx from /etc/fstab and don't insert the right modules and forcing lvm-modules doesn't help. And insert root as /dev/dm-0 or /dev/dm-1 into grub.conf.

Changing it manual in /etc/fstab and /boot/grub/grub.conf /dev/xxxx entries, calling /sbin/new-kernel-pkg with right parameters let update work. The generated initrd-image looks ok now and new kernel boots.

-> I think for fedora 10 a lot more fedora 8 system upgrades will come, you should change these entries while an upgrade.

Comment 3 Chuck Ebbert 2008-11-20 20:44:15 UTC
Preupgrade bug ???

Comment 4 Michel Lind 2008-12-12 19:40:58 UTC
Same problem on Fedora 10 with mkinitrd-6.0.71-2 (stable) and mkinitrd-6.0.71-3 (testing). The root device was specified by UUID in the originally installed kernel, but GRUB entry for the new kernel has /dev/system/root, which causes a kernel panic on boot.

Comment 5 Michel Lind 2008-12-13 01:11:01 UTC
(In reply to comment #4)
Ignore my earlier comment -- mkinitrd cannot handle the "relatime" option, this is in the FAQ (though apparently not fixed yet).

Comment 6 Stephen F. Hess 2009-01-31 18:01:55 UTC
This bug does not only affect x86_64 platforms, i686 platforms are affected as well. After upgrading to the 2.6.27.12-170 kernel and rebooting i experienced the "unable to access resume device" error as explained above. I have posted a lot of information in the forums at http://forums.fedoraforum.org/showthread.php?p=1158302#post1158302 .  Apparently this is primarily affecting LV users.

I unpacked my initrd with cpio and the difference between the 2 files are listed here:


This init section is from the 2.6.27.9-159 kernel, it works.

echo Creating character device nodes.
mkchardevs
echo "Loading pata_acpi module"
modprobe -q pata_acpi
echo "Loading ata_generic module"
modprobe -q ata_generic
echo Making device-mapper control node
mkdmnod
modprobe scsi_wait_scan
rmmod scsi_wait_scan
mkblkdevs
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure VolGroup00
resume /dev/VolGroup00/LogVol01
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/VolGroup00/LogVol00
echo Mounting root filesystem.
mount /sysroot

Here is the same init section for the 2.6.27.12-170 kernel that gives me the errors...

echo Creating character device nodes.
mkchardevs
mkblkdevs
resume /dev/dm-1
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/dm-0
echo Mounting root filesystem.
mount /sysroot

The rest of the init files for the 2 kernels are identical. 

Reading the above posts i looked at grub.conf and it is a mess with 3 different protocols for specifying root. This was not an upgrade either, but a fresh install of F10. 

[root@macbeth ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda         
#                                         
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that                    
#          all kernel and initrd paths are relative to /boot/, eg.         
#          root (hd0,1)                                                    
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00        
#          initrd /initrd-version.img                                      
#boot=/dev/sda                                                             
default=1                                                                  
timeout=5                                                                  
splashimage=(hd0,1)/grub/splash.xpm.gz                                     
hiddenmenu                                                                 
title Fedora (2.6.27.12-170.2.5.fc10.i686)                                 
        root (hd0,1)                                                       
        kernel /vmlinuz-2.6.27.12-170.2.5.fc10.i686 ro root=/dev/dm-0 rhgb quiet
        initrd /initrd-2.6.27.12-170.2.5.fc10.i686.img                          
title Fedora (2.6.27.9-159.fc10.i686)                                           
        root (hd0,1)                                                            
        kernel /vmlinuz-2.6.27.9-159.fc10.i686 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        initrd /initrd-2.6.27.9-159.fc10.i686.img                                         
title Fedora (2.6.27.5-117.fc10.i686)                                                     
        root (hd0,1)
        kernel /vmlinuz-2.6.27.5-117.fc10.i686 ro root=UUID=e9417668-c690-4607-8748-daddf521aa5d rhgb quiet
        initrd /initrd-2.6.27.5-117.fc10.i686.img
title Other
        rootnoverify (hd0,0)
        chainloader +1
[root@macbeth ~]# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>

tmpfs   /dev/shm        tmpfs   defaults        0       0
devpts  /dev/pts        devpts  gid=5,mode=620  0       0
sysfs   /sys    sysfs   defaults        0       0
proc    /proc   proc    defaults        0       0
/dev/dm-0       /       ext3    defaults        1       1
#Entry for /dev/sda2 :
UUID=64cb2f11-cffc-44d7-acdc-fce026dcef59       /boot   ext3    defaults        1       2
#Entry for /dev/sda1 :
UUID=D03006BD3006AA94   /media/sda1     ntfs-3g defaults,locale=en_US.UTF-8     0       0
/dev/dm-1       swap    swap    defaults        0       0

For the record, here is my LV information as well.

[root@macbeth ~]# lvscan
ACTIVE '/dev/VolGroup00/LogVol00' [243.94 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol01' [3.94 GB] inherit
[root@macbeth ~]# lvdisplay
--- Logical volume ---
LV Name /dev/VolGroup00/LogVol00
VG Name VolGroup00
LV UUID OYnwSh-ld05-5bLz-AOR6-HgRV-bvl4-NJKf3B
LV Write Access read/write
LV Status available
# open 1
LV Size 243.94 GB
Current LE 7806
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0

--- Logical volume ---
LV Name /dev/VolGroup00/LogVol01
VG Name VolGroup00
LV UUID 3d9XYg-3yLo-ye9n-QPzU-9OXR-BcXR-P9sx0t
LV Write Access read/write
LV Status available
# open 1
LV Size 3.94 GB
Current LE 126
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1 

Is there any ETA on a fix for this issue? Or a procedure that can be posted to correctly generate a working initrd image?

I have counted at least 3 threads over in the forum related to this issue and 3 additional users have chimed in on the thread linked above.

Comment 7 Hans de Goede 2009-01-31 23:14:47 UTC
(In reply to comment #6)
> This bug does not only affect x86_64 platforms, i686 platforms are affected as
> well. After upgrading to the 2.6.27.12-170 kernel and rebooting i experienced
> the "unable to access resume device" error as explained above. I have posted a
> lot of information in the forums at

<snip>

Thanks for all the research and the details. I've managed to reproduce this, the cause is the weird way your root and swap devices are specified in /etc/fstab.

I think the best way to fix this is to just handle the way you've specified them in /etc/fstab correctly. I'll try to work on fixing this the coming week.

Still it would be interesting to know how you ended up with those lines in /etc/fstab, was this a clean F-10 install, or an upgrade of ... and if so how did you upgrade?

Comment 8 Hans de Goede 2009-01-31 23:59:10 UTC
Created attachment 330533 [details]
Fixed mkinitrd script for F-10 users

Comment 9 Hans de Goede 2009-02-01 00:00:55 UTC
Created attachment 330534 [details]
Fixed mkinitrd script for F-9 users

Comment 10 Hans de Goede 2009-02-01 00:04:21 UTC
Ok, I think I've found and fixed the cause of this, can you please download the fixed mkinitrd script for *your Fedora version*, copy it to /sbin/mkinitrd and then recreate your initrd?

The command to regenerate the (broken) mkinitrd is (as root):
/sbin/mkinitrd -f /boot/initrd-<brokenversion>.img <brokenversion>

For example:
/sbin/mkinitrd -f /boot/initrd-2.6.27.12-170.2.5.fc10.i686.img 2.6.27.12-170.2.5.fc10.i686

Can you please test this and report back? If this indeed fixes things (which I believe it does) I will issue an update with this fix.

Comment 11 Stephen F. Hess 2009-02-01 14:19:03 UTC
Well, I initially did a "preupgrade" from F8, which worked fantastically great -- until i jacked up the user permissions playing around with gkrellm - um, don't ask... At that point I was forced to do a fresh install, after backing up my home directory I used the "install to hard drive" from the kde livecd. Would have been easier to just install from the dvd but that was not recognized in my boot sequence. Here is my smolt profile by the way. http://www.smolts.org/show?uuid=pub_6f8d28d9-f46e-4552-aa76-d21f98f26dd3

OK. here is what i did

[root@macbeth ~]# tail -f .bash_history
cp mkinitrd mkinitrd.bak
gedit mkinitrd
/sbin/mkinitrd -f /boot/initrd-2.6.27.12-170.2.5.fc10.i686.img 2.6.27.12-170.2.5.fc10.i686
init 6
that 
gedit, I pasted your data into the file mkinitrd then ran it as the example you gave in your response. on reboot I got the same "unable to access yada yada yada"

From your comments, do you think the root cause lies in the way that /etc/fstab in modified/generated?

Here is the last section of the init script generated by the mkinitrd you modified (posting everything after the tty crap)

/lib/udev/console_init tty0
daemonize --ignore-missing /bin/plymouthd
plymouth --show-splash
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo Creating character device nodes.
mkchardevs
echo "Loading pata_acpi module"
modprobe -q pata_acpi
echo "Loading ata_generic module"
modprobe -q ata_generic
modprobe scsi_wait_scan
rmmod scsi_wait_scan
mkblkdevs
resume /dev/dm-1
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/dm-0
echo Mounting root filesystem.
mount /sysroot
cond -ne 0 plymouth --hide-splash
echo Setting up other filesystems.
setuproot
loadpolicy
plymouth --newroot=/sysroot
echo Switching to new root and running init.
switchroot
echo Booting has failed.
sleep -1
[root@macbeth kernel_mod_12-170]#

Comment 12 Hans de Goede 2009-02-02 12:08:07 UTC
Created attachment 330620 [details]
Fixed mkinitrd script for F-10 users

(In reply to comment #11)
> OK. here is what i did
> 

Thanks for trying, the new initrd is better, but still not good enough. Here is a new F-10 mkinitrd which should definitely work with your /etc/fstab. Can you give this a try please?

Comment 13 Stephen F. Hess 2009-02-02 13:09:24 UTC
Perfection.
For the record, here is the procedure I used.

su
gedit /sbin/mkinitrd
> "save as" mkinitrd.bak2
> "select all"
> "delete"
> "paste" new mkinitrd script
> "save as" mkinitrd
> "exit"

/sbin/mkinitrd -f /boot/initrd-2.6.27.12-170.2.5.fc10.i686.img 2.6.27.12-170.2.5.fc10.i686
init 6



Here is the last section from the current init generated by your latest mkinitrd.

/lib/udev/console_init tty0
daemonize --ignore-missing /bin/plymouthd
plymouth --show-splash
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo Creating character device nodes.
mkchardevs
echo "Loading pata_acpi module"
modprobe -q pata_acpi
echo "Loading ata_generic module"
modprobe -q ata_generic
modprobe scsi_wait_scan
rmmod scsi_wait_scan
mkblkdevs
resume /dev/dm-1
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/dm-0
echo Mounting root filesystem.
mount /sysroot
cond -ne 0 plymouth --hide-splash
echo Setting up other filesystems.
setuproot
loadpolicy
plymouth --newroot=/sysroot
echo Switching to new root and running init.
switchroot
echo Booting has failed.
sleep -1
[root@macbeth kernel_mod_12-170]#


Thanks for the quick response. If there is anything I can help with in the future, let me know.

Comment 14 Hans de Goede 2009-02-03 10:47:00 UTC
I've just learned we've got a bug filed for the same problem some time ago already, so I'm closing this one. I'll be releasing an update with the fix in soon, you can track the progress of this in bug 475773.

*** This bug has been marked as a duplicate of bug 475773 ***