From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b4) Gecko/20050915 Fedora/1.5-0.5.0.beta1 Firefox/1.4 Description of problem: During boot with new rawhide kernel (kernel-smp-2.6.13-1.1567_FC5) received kernel panic. Earlier msg says: mkdev: '/dev/md3' is not a UUID or LABEL spec" then "mount: error 6 mounting ext3" /dev/md3 is the root partition. See attached screenshot for all msgs available. Went back to 2.6.13-1.1565_FC5smp which works fine. Version-Release number of selected component (if applicable): kernel-smp-2.6.13-1.1567_FC5 How reproducible: Always Steps to Reproduce: 1. boot (cold or warm) 2. 3. Actual Results: Kernel panic as above. See attached screen shot. Expected Results: Normal boot from current rawhide system. Additional info: lspci 00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02) 00:03.0 PCI bridge: Intel Corporation 82875P/E7210 Processor to PCI to CSA Bridge (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200] (rev 01) 01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200] (Secondary) (rev 01) 02:01.0 Ethernet controller: Intel Corporation 82547EI Gigabit Ethernet Controller (LOM) 03:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46) 03:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02) 03:0a.0 Mass storage controller: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02) 03:0d.0 Communication controller: Intel Corporation 536EP Data Fax Modem
Created attachment 119154 [details] screen shot of kernel panic screen shot as indicated.
It's trying to mount IDE (hda2) devices but it looks like you are using SATA drives that are showing up as SCSI (sda, sdb). SATA drives can show up as hda or sda depending on the drivers. Generally people prefer for them to show up as sda, but of course that means you have to change the software RAID configuration around a bit.
Same problem with today's 1570 kernel. Are the kernel's now expecting a different identification of partitions? See the error msg in the screen shot: "mkdev: '/dev/md3' is not a UUID or LABEL spec" md3 is the root partition, a raid5 array. BTW, with a USB multi-function card reader plugged in, boot _ANY_ kernel does not get beyond testing HLT instruction. Unplug the device and then 1565 and older boots and 1567 and 1570 then exhibit the not syncing problem. Separate bugzilla or is this a clue?
> Are the kernel's now expecting a different identification of partitions? That's my guess. If you boot with the old kernel and type `fdisk -l` you see devices like hda and probably hdb or hdc right? Do you see sda and sdb? With the new kernel you now have an sda and sdb device instead of your hda and hdb devices. The problem is I'm not really familiar with software RAID so I'm not sure how you go about fixing it up safely. > BTW, with a USB multi-function card reader plugged in, boot _ANY_ kernel does not get beyond testing HLT instruction. Seperate bugzilla entry.
I have always seen sda and sdb as these are SATA drives. Also, I have seen the invalid partition msg since starting the use of raid some time ago, but have not had any problems until now. This computer is for testing purposes and has FC2, FC3, FC4, FC Rawhide (using currently), SuSE, Debian, Slack, Gentoo, RHEL 4, WinXP and Mandriva installed. SuSE 9.3, Gentoo, FC3 and FC4 are all on raid partitions. Currently fdisk -l shows: Disk /dev/hda: 203.9 GB, 203928109056 bytes 255 heads, 63 sectors/track, 24792 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 1 140 1124518+ 83 Linux /dev/hda2 141 402 2104515 82 Linux swap / Solaris /dev/hda3 * 422 3244 22675747+ 7 HPFS/NTFS /dev/hda4 3245 24792 173084310 f W95 Ext'd (LBA) /dev/hda5 3245 5795 20490876 7 HPFS/NTFS /dev/hda6 5796 8707 23390608+ b W95 FAT32 /dev/hda7 8708 8720 104391 6 FAT16 /dev/hda8 8721 11908 25607578+ fd Linux raid autodetect /dev/hda9 11909 15096 25607578+ fd Linux raid autodetect /dev/hda10 15097 18284 25607578+ 83 Linux /dev/hda11 18285 21472 25607578+ 83 Linux /dev/hda12 21473 24792 26667868+ 83 Linux Disk /dev/hdb: 163.9 GB, 163928604672 bytes 255 heads, 63 sectors/track, 19929 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdb1 1 10017 80461521 f W95 Ext'd (LBA) /dev/hdb2 * 10018 16609 52950240 7 HPFS/NTFS /dev/hdb3 16610 19929 26667900 83 Linux /dev/hdb5 1 13 104359+ 83 Linux /dev/hdb6 14 76 506016 82 Linux swap / Solaris /dev/hdb7 77 2367 18402426 83 Linux /dev/hdb8 2368 4917 20482843+ 83 Linux /dev/hdb9 4918 7467 20482843+ fd Linux raid autodetect /dev/hdb10 7468 10017 20482843+ fd Linux raid autodetect Disk /dev/hde: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hde1 * 1 26 208813+ 83 Linux /dev/hde2 27 2458 19535040 83 Linux /dev/hde3 2459 5008 20482875 fd Linux raid autodetect /dev/hde4 5009 30515 204884977+ f W95 Ext'd (LBA) /dev/hde5 5009 7558 20482843+ 83 Linux /dev/hde6 7559 11206 29302528+ 83 Linux /dev/hde7 11207 13639 19543041 fd Linux raid autodetect /dev/hde8 13640 16072 19543041 83 Linux /dev/hde9 16073 19897 30724281 83 Linux /dev/hde10 19898 23084 25599546 83 Linux /dev/hde11 23085 25634 20482843+ 83 Linux /dev/hde12 25635 28184 20482843+ fd Linux raid autodetect /dev/hde13 28185 30515 18723726 83 Linux Disk /dev/hdg: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdg1 * 1 26 208813+ 83 Linux /dev/hdg2 27 2458 19535040 fd Linux raid autodetect /dev/hdg3 2459 2720 2104515 82 Linux swap / Solaris Disk /dev/sda: 122.9 GB, 122942324736 bytes 255 heads, 63 sectors/track, 14946 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 2295 18434556 fd Linux raid autodetect /dev/sda2 2296 4335 16386300 fd Linux raid autodetect /dev/sda3 4336 14946 85232857+ f W95 Ext'd (LBA) /dev/sda5 4336 6885 20482843+ fd Linux raid autodetect /dev/sda6 6886 9690 22531131 fd Linux raid autodetect /dev/sda7 9691 12240 20482843+ fd Linux raid autodetect /dev/sda8 12241 14946 21735913+ fd Linux raid autodetect Disk /dev/sdb: 122.9 GB, 122942324736 bytes 255 heads, 63 sectors/track, 14946 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 2295 18434556 fd Linux raid autodetect /dev/sdb2 2296 4335 16386300 fd Linux raid autodetect /dev/sdb3 4336 14946 85232857+ f W95 Ext'd (LBA) /dev/sdb5 4336 6885 20482843+ fd Linux raid autodetect /dev/sdb6 6886 9690 22531131 fd Linux raid autodetect /dev/sdb7 9691 12240 20482843+ fd Linux raid autodetect /dev/sdb8 12241 14946 21735913+ fd Linux raid autodetect Disk /dev/md0: 41.9 GB, 41948282880 bytes 2 heads, 4 sectors/track, 10241280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn't contain a valid partition table Disk /dev/md4: 40.0 GB, 40007630848 bytes 2 heads, 4 sectors/track, 9767488 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md4 doesn't contain a valid partition table Disk /dev/md2: 41.9 GB, 41948676096 bytes 2 heads, 4 sectors/track, 10241376 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md2 doesn't contain a valid partition table Disk /dev/md1: 40.0 GB, 40024014848 bytes 2 heads, 4 sectors/track, 9771488 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md1 doesn't contain a valid partition table Disk /dev/md3: 33.5 GB, 33558888448 bytes 2 heads, 4 sectors/track, 8193088 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md3 doesn't contain a valid partition table Disk /dev/md5: 37.7 GB, 37753716736 bytes 2 heads, 4 sectors/track, 9217216 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md5 doesn't contain a valid partition table Disk /dev/md6: 41.9 GB, 41948676096 bytes 2 heads, 4 sectors/track, 10241376 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md6 doesn't contain a valid partition table Disk /dev/md7: 41.9 GB, 41948676096 bytes 2 heads, 4 sectors/track, 10241376 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md7 doesn't contain a valid partition table
Created attachment 119203 [details] screen shot of failer rawhide kernel 1570 initrd failure
Some additional clues maybe. See attachment above for latest msgs. Using e2label, labeled /dev/md3 /rawhide, modified fstab to LABEL=/rawhide etc, changed kernel boot paramter to root=LABEL=/rawhide and ran mkinitrd for kernel 1570. Still cannot boot, but the msgs show a difference. 1565 boots fine with the changes to LABEL=/rawhide, but did see the following difference in one msg that may be significant. Successful boots with 1565 have the msg: "created path for LABEL=/rawhide: 9/3" The failed boots iwth 1567 and 1570 have the same msg, but 8/2 vice 9/3. Then the following msgs: ... Mounting root filesystem attempt to access behond end of device sda2: rw=0, want=37349760, limit=32772600 JBD: IO error reading journal superblock EXT3-fs: error loading journal. mount: error 22 mounting ext3 ...and the rest. BTW, /dev/md3 is raid0, not raid5 as indicated earlier. It is composed of SATA devices /dev/sda2 as device 0 and /dev/sdb2 as device 1 in the 2 device raid0. I wonder if the proper raid modules are being loaded in the initrd?
Peter, any recent mkinitrd changes that could explain these problems ?
Went back to mkinitrd-4.2.21-1 and recreated 1570 initrd and now can boot normally.
mkinitrd is broken, indeed. On a root on lvm on raid1 scenario, it will only identify the PVs that contain the root filesystem, but then vgscan fails because not all PVs are available (use vgscan -P, perhaps?). Also, it mis-identifies the raid device to start up. The /init script in the initrd image had raidautorun /dev/hda8, instead of /dev/md8 as it should. Worse yet, it won't bring up the raid members containing the swap LV used for swsusp, so that won't work for sure. And just to wrap it up, there's a piece of dead code in mkinitrd that probably shouldn't be dead: if echo $rootdev | cut -d/ -f3 | grep -q loop ; then [...] # check if it's root by label elif echo $rootdev | cut -c1-6 | grep -q "LABEL=" ; then [...] # check if the root fs is on a logical volume elif ! echo $rootdev | cut -c1-6 |grep -q "LABEL=" ; then [...] elif [[ "$rootdev" =~ "/dev/md[0-9]+" ]]; then [...] See, the second elif is the exact inverse condition of the first elif, so there's no possibility that the last elif will ever match. And guess what, this is exactly the case in this bug report.
Alex, can you file a separate bug for the LVM swsusp case? I'd like to handle the issue at hand here, but I probably won't fix both at once, and don't want to forget about that one.
Will do. FWIW, I tested vgscan/vgchange -P, but that doesn't work: even after you bring up the remaining PVs, the LV will remain read-only, and the other LVs that were not present at the time of the original vgscan will remain zero-sized. That's too bad :-( Maybe we could talk the LVM/DM guys to improve that?
Today's rawhide mkinitrd-4.2.23-2 has not fixed the original problem. I did notice that autodetection of raid array msgs weren't present as is usually the case. For now using kernel-2.6.13-1.1576_FC5smp whose initrd was built with mkinitrd-4.2.21-1. Are there any other tests I can do to help? Screenshot of today's panic attached. Sorry about the poor quality.
Created attachment 119295 [details] Screenshot of kernel panic from mkinitrd-4.2.23-2
Clyde, does your root filesystem have its journal on another device? Can you also attach the contents of /proc/partitions , /etc/fstab, and /etc/grub.conf?
Also, you did you remake the initrd after updating mkinitrd? Can you show me what mkinitrd now says if you run it like: mkinitrd -f -v /boot/initrd-2.6.13-1.1576_FC5smp.img 2.6.13-1.1576_FC5smp
#15 As far as I know, root journal is on /dev/md3 will attach #16 No. I just booted an older kernel (2.6.13-1.1576_FC5smp) Stand by for mkinitrd -f -v /boot/initrd-2.6.13-1.1576_FC5smp.img 2.6.13-1.1576_FC5smp
Created attachment 119313 [details] contents of /proc/partitions, fstab, grub.conf, output of mkinitrd Outputs as requested...will be trying to boot 1576 shortly.
Created attachment 119315 [details] Screenshot of kernel panic after mkinitrd-4.2.23-2 of kernel 1576 The same error has occurred after mkinitrd -f -v /boot/initrd-2.6.13-1.1576_FC5smp.img 2.6.13-1.1576_FC5smp. I do not see any RAIDAUTODETECT msgs during init as I normally do with the older mkinitrd. Now running under 2.6.13-1.1574_FC5smp whose initrd was created with mkinitrd-4.2.21-1 and saw the RAIDAUTODETECT msgs.
OK, reboot into a good kernel and do: cd /boot mkdir initrd cd initrd zcat ../initrd-2.6.13-1.1576_FC5smp.img | cpio -di It should say something like "3680 blocks" and exit; afterwards there will be many files in /boot/initrd. Please attach the one named "init". Also, can you show me the output of "e2label /dev/hda2" ?
[root@P4C800ED initrd]# zcat ../initrd-2.6.13-1.1576_FC5smp.img | cpio -di 2810 blocks [root@P4C800ED initrd]# ls bin dev etc init lib loopfs proc sbin sys sysroot [root@P4C800ED initrd]# cat init #!/bin/nash mount -t proc /proc /proc setquiet echo Mounted /proc filesystem echo Mounting sysfs mount -t sysfs /sys /sys echo Creating /dev mount -o mode=0755 -t tmpfs /dev /dev mknod /dev/console c 5 1 mknod /dev/null c 1 3 mknod /dev/zero c 1 5 mkdir /dev/pts mkdir /dev/shm echo Creating device nodes echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug makedevs echo "Loading scsi_mod.ko module" insmod /lib/scsi_mod.ko echo "Loading sd_mod.ko module" insmod /lib/sd_mod.ko echo "Loading libata.ko module" insmod /lib/libata.ko echo "Loading sata_promise.ko module" insmod /lib/sata_promise.ko echo "Loading raid0.ko module" insmod /lib/raid0.ko echo "Loading jbd.ko module" insmod /lib/jbd.ko echo "Loading ext3.ko module" insmod /lib/ext3.ko makedevs resume /dev/hda2 echo Creating root device mkrootdev /dev/root echo Mounting root filesystem mount -o defaults --ro -t ext3 /dev/root /sysroot echo Switching to new root switchroot --movedev /sysroot [root@P4C800ED /]# e2label /dev/hda2 e2label: Bad magic number in super-block while trying to open /dev/hda2 Couldn't find valid filesystem superblock.
Just to compare, here is the init for a working FC5 initrd: [root@P4C800ED initrd2]# cat init #!/bin/nash mount -t proc /proc /proc setquiet echo Mounted /proc filesystem echo Mounting sysfs mount -t sysfs /sys /sys echo Creating /dev mount -o mode=0755 -t tmpfs /dev /dev mknod /dev/console c 5 1 mknod /dev/null c 1 3 mknod /dev/zero c 1 5 mkdir /dev/pts mkdir /dev/shm echo Creating device nodes echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug makedevs echo "Loading scsi_mod.ko module" insmod /lib/scsi_mod.ko echo "Loading sd_mod.ko module" insmod /lib/sd_mod.ko echo "Loading libata.ko module" insmod /lib/libata.ko echo "Loading sata_promise.ko module" insmod /lib/sata_promise.ko echo "Loading raid0.ko module" insmod /lib/raid0.ko echo "Loading xor.ko module" insmod /lib/xor.ko echo "Loading raid5.ko module" insmod /lib/raid5.ko echo "Loading jbd.ko module" insmod /lib/jbd.ko echo "Loading ext3.ko module" insmod /lib/ext3.ko makedevs raidautorun /dev/md0 raidautorun /dev/md1 raidautorun /dev/md2 raidautorun /dev/md3 raidautorun /dev/md4 raidautorun /dev/md5 raidautorun /dev/md6 raidautorun /dev/md7 resume /dev/hda2 echo Creating root device mkrootdev /dev/root echo Mounting root filesystem mount -o defaults --ro -t ext3 /dev/root /sysroot echo Switching to new root switchroot --movedev /sysroot [root@P4C800ED initrd2]#
No luck for me either. Even though the raid1 PV containing the root LV was detected properly, it still tried to raidautorun one of its members, because the variable holding the raid device name was clobbered while processing the raid device members. Even if it was brought up properly, the other PVs in the same VG were not, so vgscan and vgchange would have failed anyway.
Created attachment 119366 [details] screenshot of kernel 1580 boot attempt with mkinitrd-5.0.0-1 mkinitrd-5.0.0-1 failed the same as previous. See next attachment for output of mkinitrd -f -v and zcat of image. NOTE: I am still not seeing any MD: Autodetecting RAID arrays...messages during initrd.
Created attachment 119367 [details] output of mkinitrd -f -v and zcat of image Output of mkinitrd-5.0.0-1 run against 1580 kernel and output of zcat of resulting image. NOTE: would have expected to see raidautorun /dev/md0 raidautorun /dev/md1 raidautorun /dev/md2 raidautorun /dev/md3 raidautorun /dev/md4 raidautorun /dev/md5 raidautorun /dev/md6 raidautorun /dev/md7 raidautorun /dev/md8 raidautorun /dev/md9 raidautorun /dev/md10 between makedevs and resume /dev/hda2; however, they are not there. Will try to revert mkinird and recreate image of 1580. Stand-by. Thanks.
Created attachment 119372 [details] Output of mkinitrd -f -v of 1580 img and zcat of resulting image mkinitrd-4.2.21-1 works fine and can now boot the 1580 kernel. The attached file shows the output of mkinitrd -f -v and zcat of the resulting image.
Created attachment 119401 [details] Several fixes and improvements for mkinitrd This patch fixes mkinitrd such that I can now boot using root on an LV whose VG scatters across multiple RAID 1 devices. It will correctly bring up all raid devices. It also fixes the swap on a different VG bug, refactoring a lot of code to simplify the overall handling of devices, while at the same time speeding things up significantly by avoiding repeating a lot of work that was repeated before.
Thanks for the patch, it looks good. This should be fixed in tomorrow's rawhide, but several other changes have been merged as well, so please give it a try.
Pulled the updated mkinitrd from cvs, rebuilt and installed. Removed the 82 kernel before installing mkinitrd, then yum update again to 82 and all is good ;) thx
Created attachment 119466 [details] screenshop mkinitrd-5.0.2 failure on kernel 1586 mkinitrd-5.0.2 doesn't work for me. Still kernel panic. Attaching screenshot and file containing output of zcat of image, output of mkinitrd of 1586, fstab, grub.conf and e2label of root partition.
Created attachment 119467 [details] listing of init from zcat of 1586 image, mkinitrd of 1586, fstab, grub.cof, etc
Created attachment 119468 [details] Successful mkinitrd-5.0.2 Success!! Now running with 1586 kernel. Changed fstab root device entry to /dev/md3 vice LABEL=/rawhide and same on kernel parameter in grub.conf. Attached file shows definite difference from previous try when fstab had LABEL=/rawhide. Looks like a problem with label detection when the root device is on a raid?
Created attachment 119477 [details] Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev This patch, to be applied after the one I posted before (that already got integrated into 5.0.2, that I haven't downloaded yet), fixes LABEL= handling. It was broken before my previous patch, but I didn't know that, so I just left it alone. This seems to fix it for me, as long as /etc/blkid.tab is accurate. It appears to generally be, since that's what mount uses for labels anyway, but I haven't been able to get it updated after running tune2fs to add a label to a pre-existing device to test it. I edited it by hand, and then tested the patch, with LABEL= for both rootdev and swsuspdev. It worked fine. Except that it's a bit unfortunate to not have the LABEL=s in the resume command. Does resume actually support the LABEL= notation?
Created attachment 119479 [details] Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev This supersedes the previous patch, since I've learned that resume actually supports LABEL=.
Hmm... So, it turns out that the LABEL= handling code doesn't work for him, Clyde tells me in private. From the bash -x output, the problem is that /etc/blkid.tab contains labels for both the raid device *and* one of the raid members, and this confuses the current logic. It would be nice if /etc/blkid.tab didn't contain such arguably incorrect info, but we might have to code around that. I guess it would suffice to simply bring up all devices that match the label, and let mount device which one to use on its own. Comments?
updated mkinitrd to 5.0.3 and then ran up2date on today's rawhide offerings. Saw the following msgs during install of the 1588 kernel: find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name LABEL=/rawhide' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ LABEL=/rawhide'. Of course the kernel wouldn't boot. fstab has LABEL=/rawhide. Went back to 1586 whose initrd was created with Mr. Oliva's patches. This initrd works as long as you specify root=/dev/md3 in the grub.conf kernel line even if LABEL=/rawhide is in fstab.
Testing error on my part...reboot after identifying the raid root partition with LABEL=/somelabelname in /etc/fstab, then run mkinitrd. Normal boot follows with LABEL= in both fstab and the kernel line in grub.conf. Not rebooting confuses mkinitrd since the initrd image was created with the raid root partition identified as /dev/someraidset in /etc/fstab. Shall I close this bugzilla? Or, does an additional test need to be added to mkinitrd to handle the bizarre case?
There's still a typo in /sbin/mkinitrd that causes mkinitrd to bring up *all* volume groups, when it's trying to bring up the VG for swsuspdev: - handlelvordev $swsupdev + handlelvordev $swsuspdev Other than that, it appears that the resume code does support labels, but /etc/blkid.tab is not there at that time, so it doesn't work, or something along these lines. Is this why the current mkinitrd code doesn't even bring up swsuspdev when it starts with LABEL=? If so, should it not set noresume if swsuspdev matches this pattern, so as to not issue the resume command that is not going to work anyway?
Installed FC5T1, updated system which included kernel 1707 and mkinitrd-5.0.11 and boot failed with root partition not found. Root partition is on a logical volume. Uninstalled mkinitrd-5.0.11 and installed mkinitrd-5.0.10, recreated 1707 initrd and now system boots. Also, all VGs not being seen by system on boot. Expect all of them to be seen. I am trying to correct the summary of this bug also to mkinitrd prob.
Had time to look at good vs bad msgs when booting mkinitrd-5.0.10 version vs mkinitrd-5.0.11 version of 1707 initrd. Significantly, 5.0.11 is using a different name for the raid devices and thus the LVM cannot find LVs!! 5.0.10: ------------------------------------------------------------------- md: considering sdd1 ... md: adding sdd1 ... md: adding sdc1 ... md: adding sda1 ... md: created md0 <==============================Note device name as expected. md: bind<sda1> md: bind<sdc1> md: bind<sdd1> md: running: <sdd1><sdc1><sda1> raid5: device sdd1 operational as raid disk 1 raid5: device sdc1 operational as raid disk 0 raid5: device sda1 operational as raid disk 2 raid5: allocated 3165kB for md0 raid5: raid level 5 set md0 active with 3 out of 3 devices, algorithm 2 RAID5 conf printout: --- rd:3 wd:3 fd:0 disk 0, o:1, dev:sdc1 disk 1, o:1, dev:sdd1 disk 2, o:1, dev:sda1 md: ... autorun DONE. Scanning logical volumes Reading all physical volumes. This may take a while... cdrom: open failed. Found volume group "VolGroup0" using metadata type lvm2 <=====Excellent!!! Activating logical volumes 8 logical volume(s) in volume group "VolGroup0" now active <==Excellent!!! Now, 5.0.11: ---------------------------------------------------------------- md: created md_d0 <==============Not good. Should be md0 md: bind<sda1> md: bind<sdc1> md: bind<sdd1> md: running: <sdd1><sdc1><sda1> raid5: device sdd1 operational as raid disk 1 <=====raid devices are correct raid5: device sdc1 operational as raid disk 0 <=====/ raid5: device sda1 operational as raid disk 2 <====/ raid5: allocated 3165kB for md_d0 raid5: raid level 5 set md_d0 active with 3 out of 3 devices, algorithm 2 RAID5 conf printout: --- rd:3 wd:3 fd:0 disk 0, o:1, dev:sdc1 disk 1, o:1, dev:sdd1 disk 2, o:1, dev:sda1 md: ... autorun DONE. Scanning logical volumes Reading all physicalcdrom: open failed. volumes. This may take a while... No volume groups found <===============Not good. Due to bad raid device name --------------------------------------------------------------------------- Should this be a separate bug?
This is still a problem with mkinitrd-5.0.12. Same problem as comment 40, no improvement, raid device names are being changed from what LVM expects as the pv names (I think). What additional information do you need from me to fix this? Please?
See also bug #174263 - kernel panic on raid system. Also fixed by downgrading mkinitrd to mkinitrd-5.0.10-1
See also bug #169450
*** Bug 174263 has been marked as a duplicate of this bug. ***
I can't reproduce this with 5.0.12 . Clyde, please do the following: mkdir /tmp/initrd cd /tmp/initrd zcat /boot/$BAD_INITRD | cpio -dvi (where $BAD_INITRD is an initrd showing md_d0 ) And then attach a copy of /tmp/initrd/init ?
Created attachment 121620 [details] init from initrd created with mkinitrd-5.0.12 As requested, init from bad initrd is attached. Thanks for chasing this.
I've looked a bit into this bug, as it affects all of my boxes. Like others, I found out that downgrading to 5.0.10 from 5.0.12, then re-creating initrd.img, fixes the problem. I also investigated differences between the initrd images created by them, and the only actual difference was nash. The generated init script was identical. That's as far as I got for now.
Downgrading to mkinitrd-5.0.10 did not fix the problem on my x86_64 box.
John Ellson, did you actually rebuild initrd after downgrading mkinitrd?
Yes, absolutely. I've spent a lot of time on this bug today while trying to retest a different (but possibly related) bug #174188 and I've been rigorous about the procedure. Is it possible the the downgrade trick isn't working on SMP kernels? The x86_64 kernels are all SMP I think? I have a i686 SMP software raid box that I'm about to try this on (as soon as I can free it up from some real work that I have to do today ;-)
Hmm, well lets say it absolutly didn't fix all the problem on the x86_64. My i686 SMP box is ok with kernel-smp-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1. So I don't think there is any evidence for my SMP theory. My x86_64 (dual core) with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1 did manage to switchroot this time, but fails right at the end of the init sequence with an OOPs. (I'll attach the dmesg output next.) The system is usable from teh primary text console, but none of the other virtual consoles produce a prompt, and startx result in a total system hang.
Created attachment 121710 [details] dmesg output on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1
If switchroot passed, then it's likely a different problem. Reverting to mkinitrd-5.0.10-1 and using an older kernel (say 2.6.14-1.1719_FC5.x86_64) definitely works (except that it sometimes freezes accessing my firewire-connected disk, but that's a different bug as well :-)
Why are the raid device names being changed in mkinitrd versions after 5.0.10? That seems to be the key to the problem. 5.0.10 lists them as /dev/mdn while 5.0.11 and .12 list them as /dev/md_dn.
Another tidbit that may or may not indicate something: I discovered that a failed boot with initrds created with mkinitrd-5.0.12 leave a /dev/md_d2 entry in /dev. Rebooting with a mkinitrd-5.0.10 initrd showed this "feature." I discovered it when I couldn't get /dev/md2 to start and found /dev/md_d2 in /proc/mdstat. I removed md_d2 from /dev and rebooted and the md_d2 node was gone and /dev/md2 started automatically during the boot process. I have no idea what this means if anything, but it sure is mysterious. Looking ahead, when test2 comes out, how will I install it on this system with the mkinitrd problem?
Installed rawhide using boot.iso and rawhide of 16 Dec. After overcoming problems in BZ 174047, system would not boot because of the problems discussed above: raidautorun names software raidsets md_d0 and md_d1 instead of md0 and md1. Current Rawhide is not installable on a configuration of pre-existing software raidsets created as /dev/md[0..9]. The software raidsets are used as PVs for LVs. I am changing severity to high and would change priority to high if allowed. Please tell me how to overcome this problem short of starting all over again from scratch. There must be a workaround. If I could use mkinitrd-5.0.10 with a current rawhide that would be great, but I don't know how to do that.
rpm -Uvh --oldpackage mkinitrd-5.0.10.i386.rpm
Would love to, but, how? I can't boot. During the install? How would I do that?
Try this: boot a rescue disk, mount the installed / (and /boot if necessary), install mkinitrd-5.0.10, then run mkinitrd by hand to generate a new initrd image. Or you can reinstall Fedora-4.90 and then update everything except mkinitrd from Fedora-development.
thanks for the ideas. Starting from the FC5T1 disks occurred to me, but I didn't want to go thru the dependency hell during updates. Will try the first method. I can't believe you and I are the only folks experiencing this. There must be a logical explanation.
Why is mkinitrd not being fixed or rolled back to mkinitrd-5.0.10 ? The problem is still present in: mkinitrd-5.0.13-1.1
I just upgraded 2 very similar servers to 2.6.14-1.1653_FC4smp. One came up fine, the other panics in boot-up with errors like: ... md: ... autorun DONE. Creating root device Mounting root filesystem EXT3-fs: unable to read superblock mount: error 22 mounting ext3 Switching to new root ERROR opening /dev/console!!!!: 2 error dup2'img fd of 0 to 0 error dup2'img fd of 0 to 1 error dup2'img fd of 0 to 2 unmounting old /proc unmounting old /sys switchroot: mount failed: 22 Kernel Panic- not syncing: Attempting to kill init! Both servers are smp [dual PIII-800mhz], same tyan mobo, with 2 SCSI disks. Both are running software RAID (with similar partitions: /dev/md0 is /boot and /dev/md1 is /). There are several slight variations in their installs. Booting 2.6.14-1.1644_FC4smp on the panicking server works, so I've dropped back to using that. I've looked around, but I don't see what changed in 1653 to trigger this now. Apologies if my issue is not related to this bug. I suspect it is, but it's hard to be sure.
Bill, Sounds like the same bug, if it is this should work-around it: wget http://download.fedora.redhat.com/pub/fedora/linux/core/test/4.90/i386/os/Fedora/RPMS/mkinitrd-5.0.10-1.i386.rpm rpm -Uvh --oldpackage mkinitrd-5.0.10-1.i386.rpm cd /boot mv initrd-2.6.14-1.1771_FC5.img initrd-2.6.14-1.1771_FC5.img.OLD mkinitrd initrd-2.6.14-1.1771_FC5.img 2.6.14-1.1771_FC5 Please report back if that fixes it.
I got this: # rpm -Uvh --oldpackage mkinitrd-5.0.10-1.i386.rpm warning: mkinitrd-5.0.10-1.i386.rpm: Header V3 DSA signature: NOKEY, key ID 30c9ecf8 error: Failed dependencies: libc.so.6(GLIBC_2.4) is needed by mkinitrd-5.0.10-1.i386 Is it safe to use --nodeps?
I wouldn't. There have been other mkinitrd that worked. Can you just back out the most recent version on your system? Or build your own mkinitrd with: rpmbuild --rebuild minitrd-5.0.10-1.src.rpm
I have mkinitrd-4.2.15-1 installed. It's the same version that is on the FC4 install CD. There haven't been any updates for it. I guess you guys are ahead of me, testing what will be FC5? So, either my problem is not related to the problem you have. Or, it has been around for a long time. But, the problem only just started for me with the latest batch of yum updates (including upgraading the kernel from 2.6.14- 1.1644_FC4smp to 2.6.14-1.1653_FC4) Like I said, my other, similar, server booted up fine. So, I assume it's some sort of variation in the config. ... and some how the latest kernel update (or maybe some other recently updated package) tickled it. Could my problem by mkinitrd if mkinitrd has not changed?
(In reply to comment #60) > I can't believe you and I are the only folks experiencing this. There must be > a logical explanation. I'm experiencing the same problem, FWIW, and I'm also surprised that there aren't more people reporting the problem. Maybe few people run the development tree on RAID boxes, and will only run into the problem when they try to install the next release of Fedora on their servers... Oh well :-(
Reference my comment 60: I got the 16 Dec rawhide tree to boot by booting what I call my rescue Fedora on the same machine, yum updating it (except for mkinitrd!), mounting the unbootable rawhide root, and its boot partition on a tempdir, bind mounting sys, dev and proc, chroot to the tempdir. Then I reinstalled mkinitrd-5.0.10 and recreated the initrd. Worked like a charm, thanks John for the idea. When FC5T2 comes out, I will be able to do a test install of it using this method, but am going to try to revert mkinitrd before first boot (HINT!!! would be nice if mkinitrd-5.0.10 were in FC5T2) In looking at the differences today between 5.0.10 and 5.0.13 using diff, the most significant differences are in nash. And in nash the most significant differences are the new functions introduced with 5.0.11, i.e., coeOpen etc. I am not a programmer and am struggling to understand the code I see, but I wonder what would happen if nash were reverted to just open, etc vice coeOpen, etc? The changing of the raidset device names is just a mystery that doesn't seem to be related to kernel version or anything else except nash changes.
*** Bug 176179 has been marked as a duplicate of this bug. ***
I had reinstalled the system without raid, so re-installed it again with raid and proved the problem still existed, booted from rescue cd, manually assembled arrays and mounted / and /boot into /mnt/sysimage, chrooted into /mnt/sysimage, downloaded and installed mkinitrd 5.0.10-1 then ran it as described to create a new initrd.img (about 10K smaller than the original) Rebooted, result is different but no better :-( I can't seem to capture the full output on a serial console, so below is the serial console output, and here http://adslpipe.co.uk/initrd.jpg is a screen photo of the remainder. Is the fstab.sys something that gets rolled into the initrd? when mounted in rescue mode I can't see that file anywhere for it to pick up, anything else? SCSI subsystem initialized libata version 1.20 loaded. ahci 0000:00:1f.2: version 1.2 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193 input: ImPS/2 Generic Wheel Mouse as /class/input/input1 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci 0000:00:1f.2: flags: 64bit ncq led clo pio slum part ata1: SATA max UDMA/133 cmd 0xF8828100 ctl 0x0 bmdma 0x0 irq 66 ata2: SATA max UDMA/133 cmd 0xF8828180 ctl 0x0 bmdma 0x0 irq 66 ata3: SATA max UDMA/133 cmd 0xF8828200 ctl 0x0 bmdma 0x0 irq 66 ata4: SATA max UDMA/133 cmd 0xF8828280 ctl 0x0 bmdma 0x0 irq 66 ata1: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023 88:207f ata1: dev 0 ATA-7, max UDMA/133, 488397168 sectors: LBA48 ata1: dev 0 configured for UDMA/133 scsi0 : ahci ata2: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023 88:207f ata2: dev 0 ATA-7, max UDMA/133, 488397168 sectors: LBA48 ata2: dev 0 configured for UDMA/133 scsi1 : ahci ata3: no device found (phy stat 00000000) scsi2 : ahci ata4: no device found (phy stat 00000000) scsi3 : ahci Vendor: ATA Model: WDC WD2500KS-00M Rev: 02.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 sd 0:0:0:0: Attached scsi disk sda Vendor: ATA Model: WDC WD2500KS-00M Rev: 02.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sdb3 sdb4 sd 1:0:0:0: Attached scsi disk sdb Kernel panic - not syncing: Attempted to kill init! [<c0121184>] panic+0x3c/0x16d [<c0123b37>] do_exit+0x6c/0x372 [<c0123ef4>] sys_exit_group+0x0/0xd [<c0103f19>] syscall_call+0x7/0xb
Hmm, the raidautorun doesn't seem to be called at all now, /etc/mdadm.conf and /etc/fstab look ok, the md devices get assembled if I add another entry to grub.conf using the initrd.img.old Nothing about raidautorun looks conditional inside rc.sysinit, not sure about what nash actually does though ...
None of my software raid boxes run with mkinitrd-5.0.15-1 today either, but the problem is even worse now. I have one i686smp box that won't boot (same error as #70) kernel-smp-2.6.14-1.1776_FC5 with either mkinitrd-5.0.10-1 or mkinitrd-5.0.5.1. So perhaps the problem isn't just mkinitrd? Reverting to kernel-smp-2.6.14-1.1773_FC5 with mkinitrd-5.0.10-1 brings it back up.
System 1: Root partitions on software raid. I can boot with mkinitrd-5.0.15-1, BUT none of the fstab raid mounts worked because they reference /dev/md0, /dev/md1, etc and all of the raidsets got identified as /dev/md_d0, /dev/md_d1, etc during initrd in /dev. Root on this system is on /dev/md3, but it is identified in fstab by label=/RAWHIDE, thus allowing the boot. Reverting to mkinitrd-5.0.10 didn't work either until I booted an older kernel and then recreated the 1776 initrd. However, still running under 1773 since there may be a networking issue with 1776 (bug 176250). System 2: Root partitions on LVM on software raid5 PVs. Didn't allow new mkinitrd to be upgraded, so 1776 was created with existing mkinitrd-5.0.10. Got kernel oops on reboot, still working on that and will get a terminal trace. May be a different bug.
I've just ungzip'ed and cpio'ed the the original initrd and the one generated by mkinitrd-5.0.10-1 and it seems that the init scripts are different the original init has "insmod /lib/raid1" and "raidautorun /dev/md1" statements the regenerated init has neither coudl the fact that I ran the nkinitrd inside a chrooted rescue environment, rather than a "proper" config e.g. with an older kernel have influenced what got added to the init script? I'm just going to re-cpio and re-gzip a tweaked initrd and see what I can achieve ...
Hurrah! I extracted the contents of the initrd built by mkinitrd 5.0.10-1 added in the raid1.ko and the init script from the original initrd rebuilt it with find . | cpio -o -c | gzip > /boot/initrd.img and the machine boots, does anyone know *what* is different about the newer mkinitrd versions that breaks raid? is this going to be treated as a blocker for FC5T2? I've learned more about kernel booting than I expected ;-)
OK, after getting it working I decided to see if today's rawhide would break it again ... yum updated mkinitrd from 5.0.10-1 to 5.0.15-1 then yum updatd kernel from 2.6.14-1.1770_FC5smp to 2.6.14-1.1776_FC5smp I assume updateing the kernel builds a new initrd rather than installing a standard one? Either way, with the new kernel the machine is back to panicing again as the arrays *ARE* being seen as /dev/md_d0 and md_d1 instead of /dev/md0 and md1 <panto>He's behind you :-)</panto>
> the original init has "insmod /lib/raid1" and "raidautorun /dev/md1" > statements the regenerated init has neither This indicates that the root filesystem wasn't mounted on the raid... If you run mkinitrd with -v, what's the output say? > coudl the fact that I ran the nkinitrd inside a chrooted rescue environment, > rather than a "proper" config e.g. with an older kernel have influenced what > got added to the init script? I'm just going to re-cpio and re-gzip a tweaked > initrd and see what I can achieve ... Yeah, that could make a big difference if the rescue image doesn't have the same fstab, /dev/root, etc set up, or if /sys/block somehow came out significantly different. In which case this is a problem with the rescue environment's setup.
> This indicates that the root filesystem wasn't mounted on the raid... > If you run mkinitrd with -v, what's the output say? Can't say, that incarnation of the machine is long gone ;-) But when I upgraded to mkinit-5.0.15-1 and the upgraded the kernel to 1776 it *did* correctly add raid1.ko into the initrd, so I think it happenned *because* I used mkinitrd from the rescue cd. > Yeah, that could make a big difference if the rescue image doesn't > have the same fstab, /dev/root, etc set up, or if /sys/block somehow > came out significantly different. At the time I didn't have any choice as the machine didn't have an older kernel installed so I had to use the rescue cd. Even though I was chrooted into /mnt/sysimage where my /dev/mdX were mounted can the mkinitrd "see through" the chroot and be looking at the rescue cd's *real* root, which obviously wouldn't be raid? I didn't remount anything like /proc or /sys inside the chroot environment, don't know if that would have helped or hindered ... > In which case this is a problem with the rescue environment's setup. Worthy of a separate BZ entry? Or is that "expected" given my explanation above
I don't have any idea who does the rescue image. So yeah, if mkinitrd isn't working because the rescue image isn't setting things up right, that probably merits a bz entry against it.
Created attachment 122479 [details] console output of kernel 1776 oops
Completing comment #73 WRT System2: Kernel oops regardless of mkinitrd-5.0.10 or .15. Console listing attached.
Should have tried acpi=off before I cluttered up this BZ. Kernel 1776 boots with mkinitrd-5.0.10 ok on System 2 (LV on raid5 PV). Mkinitrd-5.0.15 does not boot because of raidsets being renamed.
OK, I think I know what's happening, but it's a kernel issue. As such, I reassigned bz #176179 to kernel, as it's a much shorter and easier to follow version of the same issue, and I'm going to close this bug as a dupe. *** This bug has been marked as a duplicate of 176179 ***
See my comments over on bug 176179 - I've been having what seems to be the same bug and, having finally found time to look at the problem I'd say it was a nash bug, not a kernel bug. The following patch to mkinitrd fixes this problem *for*me* - it reverses a change which occurred sometime between -10 and -15. --- mkinitrd-5.0.15/nash/nash.c.orig 2005-12-19 19:22:59.000000000 +0000 +++ mkinitrd-5.0.15/nash/nash.c 2006-01-02 20:13:46.000000000 +0000 @@ -1059,7 +1059,7 @@ return 1; } - if (ioctl(fd, RAID_AUTORUN)) { + if (ioctl(fd, RAID_AUTORUN, 0)) { eprintf("raidautorun: RAID_AUTORUN failed: %s\n", strerror(errno)); close(fd); return 1; RAID_AUTORUN expects an argument & this is specifically whether it should set up partitioned MD devices. Not supplying the argument in nash has to be wrong and the expected result exactly fits the observed symptoms. Supplying a fixed argument of 0 is probably wrong as well - it should depend on whether mkinitrd finds partioned RAID devices.