169059 – kernel panic - not syncing: Attempted to kill init!--mkinitrd prob (was: mkdev prob??)

Bug 169059 - kernel panic - not syncing: Attempted to kill init!--mkinitrd prob (was: mkdev prob??)

Summary: kernel panic - not syncing: Attempted to kill init!--mkinitrd prob (was: mkde...

Keywords:
Status:	CLOSED DUPLICATE of bug 176179
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mkinitrd
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Peter Jones
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	174263 (view as bug list)
Depends On:
Blocks:	169285
TreeView+	depends on / blocked

Reported:	2005-09-22 17:25 UTC by Clyde E. Kunkel
Modified:	2007-11-30 22:11 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-12-21 20:41:08 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screen shot of kernel panic (1.90 MB, image/jpeg) 2005-09-22 17:27 UTC, Clyde E. Kunkel	no flags	Details
screen shot of failer rawhide kernel 1570 initrd failure (128.35 KB, image/jpeg) 2005-09-23 19:20 UTC, Clyde E. Kunkel	no flags	Details
Screenshot of kernel panic from mkinitrd-4.2.23-2 (131.83 KB, image/jpeg) 2005-09-27 12:52 UTC, Clyde E. Kunkel	no flags	Details
contents of /proc/partitions, fstab, grub.conf, output of mkinitrd (13.44 KB, text/plain) 2005-09-27 17:58 UTC, Clyde E. Kunkel	no flags	Details
Screenshot of kernel panic after mkinitrd-4.2.23-2 of kernel 1576 (144.84 KB, image/jpeg) 2005-09-27 18:17 UTC, Clyde E. Kunkel	no flags	Details
screenshot of kernel 1580 boot attempt with mkinitrd-5.0.0-1 (143.51 KB, image/jpeg) 2005-09-28 14:33 UTC, Clyde E. Kunkel	no flags	Details
output of mkinitrd -f -v and zcat of image (3.75 KB, text/plain) 2005-09-28 14:41 UTC, Clyde E. Kunkel	no flags	Details
Output of mkinitrd -f -v of 1580 img and zcat of resulting image (3.93 KB, text/plain) 2005-09-28 15:09 UTC, Clyde E. Kunkel	no flags	Details
Several fixes and improvements for mkinitrd (6.05 KB, patch) 2005-09-29 07:03 UTC, Alexandre Oliva	no flags	Details \| Diff
screenshop mkinitrd-5.0.2 failure on kernel 1586 (123.08 KB, image/jpeg) 2005-09-30 13:49 UTC, Clyde E. Kunkel	no flags	Details
listing of init from zcat of 1586 image, mkinitrd of 1586, fstab, grub.cof, etc (11.67 KB, text/plain) 2005-09-30 13:51 UTC, Clyde E. Kunkel	no flags	Details
Successful mkinitrd-5.0.2 (6.02 KB, text/plain) 2005-09-30 14:14 UTC, Clyde E. Kunkel	no flags	Details
Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev (2.35 KB, patch) 2005-09-30 17:47 UTC, Alexandre Oliva	no flags	Details \| Diff
Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev (2.57 KB, patch) 2005-09-30 18:01 UTC, Alexandre Oliva	no flags	Details \| Diff
init from initrd created with mkinitrd-5.0.12 (2.11 KB, text/plain) 2005-11-30 01:48 UTC, Clyde E. Kunkel	no flags	Details
dmesg output on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1 (22.73 KB, text/plain) 2005-12-01 21:30 UTC, John Ellson	no flags	Details
console output of kernel 1776 oops (13.54 KB, text/plain) 2005-12-21 04:46 UTC, Clyde E. Kunkel	no flags	Details
Show Obsolete (1) View All

Description Clyde E. Kunkel 2005-09-22 17:25:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b4) Gecko/20050915 Fedora/1.5-0.5.0.beta1 Firefox/1.4

Description of problem:
During boot with new rawhide kernel (kernel-smp-2.6.13-1.1567_FC5) received kernel panic.  Earlier msg says: mkdev: '/dev/md3' is not a UUID or LABEL spec" then "mount: error 6 mounting ext3"

/dev/md3 is the root partition.

See attached screenshot for all msgs available.

Went back to 2.6.13-1.1565_FC5smp which works fine.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.13-1.1567_FC5

How reproducible:
Always

Steps to Reproduce:
1. boot (cold or warm)
2. 
3.
  

Actual Results:  Kernel panic as above.  See attached screen shot.

Expected Results:  Normal boot from current rawhide system.

Additional info:

lspci
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02)
00:03.0 PCI bridge: Intel Corporation 82875P/E7210 Processor to PCI to CSA Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200] (rev 01)
01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200] (Secondary) (rev 01)
02:01.0 Ethernet controller: Intel Corporation 82547EI Gigabit Ethernet Controller (LOM)
03:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
03:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02)
03:0a.0 Mass storage controller: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
03:0d.0 Communication controller: Intel Corporation 536EP Data Fax Modem

Comment 1 Clyde E. Kunkel 2005-09-22 17:27:07 UTC

Created attachment 119154 [details]
screen shot of kernel panic

screen shot as indicated.

Comment 2 Dan Carpenter 2005-09-23 06:23:05 UTC

It's trying to mount IDE (hda2) devices but it looks like you are using SATA
drives that are showing up as SCSI (sda, sdb).

SATA drives can show up as hda or sda depending on the drivers.

Generally people prefer for them to show up as sda, but of course that means you
have to change the software RAID configuration around a bit.

Comment 3 Clyde E. Kunkel 2005-09-23 15:14:41 UTC

Same problem with today's 1570 kernel.  Are the kernel's now expecting a
different identification of partitions?  See the error msg in the screen shot:
"mkdev:  '/dev/md3' is not a UUID or LABEL spec"  md3 is the root partition, a
raid5 array.

BTW, with a USB multi-function card reader plugged in, boot _ANY_ kernel does
not get beyond testing HLT instruction.  Unplug the device and then 1565 and
older boots and 1567 and 1570 then exhibit the not syncing problem.  Separate
bugzilla or is this a clue?

Comment 4 Dan Carpenter 2005-09-23 15:59:22 UTC

> Are the kernel's now expecting a different identification of partitions?

That's my guess.  If you boot with the old kernel and type `fdisk -l` you see
devices like hda and probably hdb or hdc right?  Do you see sda and sdb?  With
the new kernel you now have an sda and sdb device instead of your hda and hdb
devices.

The problem is I'm not really familiar with software RAID so I'm not sure how
you go about fixing it up safely.

> BTW, with a USB multi-function card reader plugged in, boot _ANY_ kernel does
not get beyond testing HLT instruction.

Seperate bugzilla entry.

Comment 5 Clyde E. Kunkel 2005-09-23 17:19:49 UTC

I have always seen sda and sdb as these are SATA drives.  Also, I have seen the
invalid partition msg since starting the use of raid some time ago, but have not
had any problems until now.  This computer is for testing purposes and has FC2,
FC3, FC4, FC Rawhide (using currently), SuSE, Debian, Slack, Gentoo, RHEL 4,
WinXP and Mandriva installed. SuSE 9.3, Gentoo, FC3 and FC4 are all on raid
partitions.  Currently fdisk -l shows:

Disk /dev/hda: 203.9 GB, 203928109056 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1         140     1124518+  83  Linux
/dev/hda2             141         402     2104515   82  Linux swap / Solaris
/dev/hda3   *         422        3244    22675747+   7  HPFS/NTFS
/dev/hda4            3245       24792   173084310    f  W95 Ext'd (LBA)
/dev/hda5            3245        5795    20490876    7  HPFS/NTFS
/dev/hda6            5796        8707    23390608+   b  W95 FAT32
/dev/hda7            8708        8720      104391    6  FAT16
/dev/hda8            8721       11908    25607578+  fd  Linux raid autodetect
/dev/hda9           11909       15096    25607578+  fd  Linux raid autodetect
/dev/hda10          15097       18284    25607578+  83  Linux
/dev/hda11          18285       21472    25607578+  83  Linux
/dev/hda12          21473       24792    26667868+  83  Linux

Disk /dev/hdb: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1               1       10017    80461521    f  W95 Ext'd (LBA)
/dev/hdb2   *       10018       16609    52950240    7  HPFS/NTFS
/dev/hdb3           16610       19929    26667900   83  Linux
/dev/hdb5               1          13      104359+  83  Linux
/dev/hdb6              14          76      506016   82  Linux swap / Solaris
/dev/hdb7              77        2367    18402426   83  Linux
/dev/hdb8            2368        4917    20482843+  83  Linux
/dev/hdb9            4918        7467    20482843+  fd  Linux raid autodetect
/dev/hdb10           7468       10017    20482843+  fd  Linux raid autodetect

Disk /dev/hde: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hde1   *           1          26      208813+  83  Linux
/dev/hde2              27        2458    19535040   83  Linux
/dev/hde3            2459        5008    20482875   fd  Linux raid autodetect
/dev/hde4            5009       30515   204884977+   f  W95 Ext'd (LBA)
/dev/hde5            5009        7558    20482843+  83  Linux
/dev/hde6            7559       11206    29302528+  83  Linux
/dev/hde7           11207       13639    19543041   fd  Linux raid autodetect
/dev/hde8           13640       16072    19543041   83  Linux
/dev/hde9           16073       19897    30724281   83  Linux
/dev/hde10          19898       23084    25599546   83  Linux
/dev/hde11          23085       25634    20482843+  83  Linux
/dev/hde12          25635       28184    20482843+  fd  Linux raid autodetect
/dev/hde13          28185       30515    18723726   83  Linux

Disk /dev/hdg: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdg1   *           1          26      208813+  83  Linux
/dev/hdg2              27        2458    19535040   fd  Linux raid autodetect
/dev/hdg3            2459        2720     2104515   82  Linux swap / Solaris

Disk /dev/sda: 122.9 GB, 122942324736 bytes
255 heads, 63 sectors/track, 14946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2295    18434556   fd  Linux raid autodetect
/dev/sda2            2296        4335    16386300   fd  Linux raid autodetect
/dev/sda3            4336       14946    85232857+   f  W95 Ext'd (LBA)
/dev/sda5            4336        6885    20482843+  fd  Linux raid autodetect
/dev/sda6            6886        9690    22531131   fd  Linux raid autodetect
/dev/sda7            9691       12240    20482843+  fd  Linux raid autodetect
/dev/sda8           12241       14946    21735913+  fd  Linux raid autodetect

Disk /dev/sdb: 122.9 GB, 122942324736 bytes
255 heads, 63 sectors/track, 14946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        2295    18434556   fd  Linux raid autodetect
/dev/sdb2            2296        4335    16386300   fd  Linux raid autodetect
/dev/sdb3            4336       14946    85232857+   f  W95 Ext'd (LBA)
/dev/sdb5            4336        6885    20482843+  fd  Linux raid autodetect
/dev/sdb6            6886        9690    22531131   fd  Linux raid autodetect
/dev/sdb7            9691       12240    20482843+  fd  Linux raid autodetect
/dev/sdb8           12241       14946    21735913+  fd  Linux raid autodetect

Disk /dev/md0: 41.9 GB, 41948282880 bytes
2 heads, 4 sectors/track, 10241280 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/md4: 40.0 GB, 40007630848 bytes
2 heads, 4 sectors/track, 9767488 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md4 doesn't contain a valid partition table

Disk /dev/md2: 41.9 GB, 41948676096 bytes
2 heads, 4 sectors/track, 10241376 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 40.0 GB, 40024014848 bytes
2 heads, 4 sectors/track, 9771488 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md3: 33.5 GB, 33558888448 bytes
2 heads, 4 sectors/track, 8193088 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md3 doesn't contain a valid partition table

Disk /dev/md5: 37.7 GB, 37753716736 bytes
2 heads, 4 sectors/track, 9217216 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md5 doesn't contain a valid partition table

Disk /dev/md6: 41.9 GB, 41948676096 bytes
2 heads, 4 sectors/track, 10241376 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md6 doesn't contain a valid partition table

Disk /dev/md7: 41.9 GB, 41948676096 bytes
2 heads, 4 sectors/track, 10241376 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md7 doesn't contain a valid partition table

Comment 6 Clyde E. Kunkel 2005-09-23 19:20:35 UTC

Created attachment 119203 [details]
screen shot of failer rawhide kernel 1570 initrd failure

Comment 7 Clyde E. Kunkel 2005-09-23 19:33:15 UTC

Some additional clues maybe.  See attachment above for latest msgs.  Using
e2label, labeled /dev/md3 /rawhide, modified fstab to LABEL=/rawhide etc,
changed kernel boot paramter to root=LABEL=/rawhide and ran mkinitrd for kernel
1570.  Still cannot boot, but the msgs show a difference.  1565 boots fine with
the changes to LABEL=/rawhide, but did see the following difference in one msg
that may be significant.  Successful boots with 1565 have the msg: "created path
for LABEL=/rawhide: 9/3"  The failed boots iwth 1567 and 1570 have the same msg,
but 8/2 vice 9/3.  Then the following msgs:
...
Mounting root filesystem
attempt to access behond end of device
sda2: rw=0, want=37349760, limit=32772600
JBD: IO error reading journal superblock
EXT3-fs: error loading journal.
mount: error 22 mounting ext3
...and the rest.

BTW, /dev/md3 is raid0, not raid5 as indicated earlier.  It is composed of SATA
devices /dev/sda2 as device 0 and /dev/sdb2 as device 1 in the 2 device raid0. I
wonder if the proper raid modules are being loaded in the initrd?

Comment 8 Dave Jones 2005-09-23 19:37:40 UTC

Peter, any recent mkinitrd changes that could explain these problems ?

Comment 9 Clyde E. Kunkel 2005-09-23 20:29:55 UTC

Went back to mkinitrd-4.2.21-1 and recreated 1570 initrd and now can boot normally.

Comment 10 Alexandre Oliva 2005-09-23 22:32:04 UTC

mkinitrd is broken, indeed.  On a root on lvm on raid1 scenario, it will only
identify the PVs that contain the root filesystem, but then vgscan fails because
not all PVs are available (use vgscan -P, perhaps?).  Also, it mis-identifies
the raid device to start up.  The /init script in the initrd image had
raidautorun /dev/hda8, instead of /dev/md8 as it should.  Worse yet, it won't
bring up the raid members containing the swap LV used for swsusp, so that won't
work for sure.  And just to wrap it up, there's a piece of dead code in mkinitrd
that probably shouldn't be dead:

if echo $rootdev | cut -d/ -f3 | grep -q loop ; then
[...]
# check if it's root by label
elif echo $rootdev | cut -c1-6 | grep -q "LABEL=" ; then
[...]
# check if the root fs is on a logical volume
elif ! echo $rootdev | cut -c1-6 |grep -q "LABEL=" ; then
[...]
elif [[ "$rootdev" =~ "/dev/md[0-9]+" ]]; then
[...]

See, the second elif is the exact inverse condition of the first elif, so
there's no possibility that the last elif will ever match.  And guess what, this
is exactly the case in this bug report.

Comment 11 Peter Jones 2005-09-25 20:48:03 UTC

Alex, can you file a separate bug for the LVM swsusp case?  I'd like to handle
the issue at hand here, but I probably won't fix both at once, and don't want to
forget about that one.

Comment 12 Alexandre Oliva 2005-09-26 14:43:01 UTC

Will do.  FWIW, I tested vgscan/vgchange -P, but that doesn't work: even after
you bring up the remaining PVs, the LV will remain read-only, and the other LVs
that were not present at the time of the original vgscan will remain zero-sized.
 That's too bad :-(  Maybe we could talk the LVM/DM guys to improve that?

Comment 13 Clyde E. Kunkel 2005-09-27 12:50:20 UTC

Today's rawhide mkinitrd-4.2.23-2 has not fixed the original problem.  I did
notice that autodetection of raid array msgs weren't present as is usually the
case.  For now using kernel-2.6.13-1.1576_FC5smp whose initrd was built with
mkinitrd-4.2.21-1.  

Are there any other tests I can do to help?

Screenshot of today's panic attached. Sorry about the poor quality.

Comment 14 Clyde E. Kunkel 2005-09-27 12:52:07 UTC

Created attachment 119295 [details]
Screenshot of kernel panic from mkinitrd-4.2.23-2

Comment 15 Peter Jones 2005-09-27 15:53:34 UTC

Clyde, does your root filesystem have its journal on another device?

Can you also attach the contents of /proc/partitions , /etc/fstab, and
/etc/grub.conf?

Comment 16 Peter Jones 2005-09-27 16:16:22 UTC

Also, you did you remake the initrd after updating mkinitrd?

Can you show me what mkinitrd now says if you run it like:

mkinitrd -f -v /boot/initrd-2.6.13-1.1576_FC5smp.img 2.6.13-1.1576_FC5smp

Comment 17 Clyde E. Kunkel 2005-09-27 17:50:15 UTC

#15 As far as I know, root journal is on /dev/md3
    will attach
#16 No. I just booted an older kernel (2.6.13-1.1576_FC5smp)
    Stand by for mkinitrd -f -v /boot/initrd-2.6.13-1.1576_FC5smp.img
2.6.13-1.1576_FC5smp

Comment 18 Clyde E. Kunkel 2005-09-27 17:58:57 UTC

Created attachment 119313 [details]
contents of /proc/partitions, fstab, grub.conf, output of mkinitrd

Outputs as requested...will be trying to boot 1576 shortly.

Comment 19 Clyde E. Kunkel 2005-09-27 18:17:19 UTC

Created attachment 119315 [details]
Screenshot of kernel panic after mkinitrd-4.2.23-2 of kernel 1576

The same error has occurred after mkinitrd -f -v
/boot/initrd-2.6.13-1.1576_FC5smp.img 2.6.13-1.1576_FC5smp.  I do not see any
RAIDAUTODETECT msgs during init as I normally do with the older mkinitrd.  Now
running under 2.6.13-1.1574_FC5smp whose initrd was created with
mkinitrd-4.2.21-1 and saw the RAIDAUTODETECT msgs.

Comment 20 Peter Jones 2005-09-27 19:08:08 UTC

OK, reboot into a good kernel and do:

cd /boot
mkdir initrd
cd initrd
zcat ../initrd-2.6.13-1.1576_FC5smp.img | cpio -di

It should say something like "3680 blocks" and exit; afterwards there will be
many files in /boot/initrd.  Please attach the one named "init".

Also, can you show me the output of "e2label /dev/hda2" ?

Comment 21 Clyde E. Kunkel 2005-09-27 19:45:28 UTC

[root@P4C800ED initrd]# zcat ../initrd-2.6.13-1.1576_FC5smp.img | cpio -di
2810 blocks
[root@P4C800ED initrd]# ls
bin  dev  etc  init  lib  loopfs  proc  sbin  sys  sysroot
[root@P4C800ED initrd]# cat init
#!/bin/nash

mount -t proc /proc /proc
setquiet
echo Mounted /proc filesystem
echo Mounting sysfs
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mknod /dev/console c 5 1
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mkdir /dev/pts
mkdir /dev/shm
echo Creating device nodes
echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug
makedevs
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading libata.ko module"
insmod /lib/libata.ko
echo "Loading sata_promise.ko module"
insmod /lib/sata_promise.ko
echo "Loading raid0.ko module"
insmod /lib/raid0.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
makedevs
resume /dev/hda2
echo Creating root device
mkrootdev /dev/root
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
echo Switching to new root
switchroot --movedev /sysroot

[root@P4C800ED /]# e2label /dev/hda2
e2label: Bad magic number in super-block while trying to open /dev/hda2
Couldn't find valid filesystem superblock.

Comment 22 Clyde E. Kunkel 2005-09-27 22:06:21 UTC

Just to compare, here is the init for a working FC5 initrd:
[root@P4C800ED initrd2]# cat init
#!/bin/nash

mount -t proc /proc /proc
setquiet
echo Mounted /proc filesystem
echo Mounting sysfs
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mknod /dev/console c 5 1
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mkdir /dev/pts
mkdir /dev/shm
echo Creating device nodes
echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug
makedevs
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading libata.ko module"
insmod /lib/libata.ko
echo "Loading sata_promise.ko module"
insmod /lib/sata_promise.ko
echo "Loading raid0.ko module"
insmod /lib/raid0.ko
echo "Loading xor.ko module"
insmod /lib/xor.ko
echo "Loading raid5.ko module"
insmod /lib/raid5.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
makedevs
raidautorun /dev/md0
raidautorun /dev/md1
raidautorun /dev/md2
raidautorun /dev/md3
raidautorun /dev/md4
raidautorun /dev/md5
raidautorun /dev/md6
raidautorun /dev/md7
resume /dev/hda2
echo Creating root device
mkrootdev /dev/root
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
echo Switching to new root
switchroot --movedev /sysroot
[root@P4C800ED initrd2]#

Comment 23 Alexandre Oliva 2005-09-28 07:30:12 UTC

No luck for me either.  Even though the raid1 PV containing the root LV was
detected properly, it still tried to raidautorun one of its members, because the
variable holding the raid device name was clobbered while processing the raid
device members.  Even if it was brought up properly, the other PVs in the same
VG were not, so vgscan and vgchange would have failed anyway.

Comment 24 Clyde E. Kunkel 2005-09-28 14:33:19 UTC

Created attachment 119366 [details]
screenshot of kernel 1580 boot attempt with mkinitrd-5.0.0-1

mkinitrd-5.0.0-1 failed the same as previous.  See next attachment for output
of mkinitrd -f -v and zcat of image.

NOTE:  I am still not seeing any MD: Autodetecting RAID arrays...messages
during initrd.

Comment 25 Clyde E. Kunkel 2005-09-28 14:41:18 UTC

Created attachment 119367 [details]
output of mkinitrd -f -v and zcat of image

Output of mkinitrd-5.0.0-1 run against 1580 kernel and output of zcat of
resulting image.

NOTE:  would have expected to see 

raidautorun /dev/md0
raidautorun /dev/md1
raidautorun /dev/md2
raidautorun /dev/md3
raidautorun /dev/md4
raidautorun /dev/md5
raidautorun /dev/md6
raidautorun /dev/md7
raidautorun /dev/md8
raidautorun /dev/md9
raidautorun /dev/md10

between makedevs and resume /dev/hda2; however, they are not there.  Will try
to revert mkinird and recreate image of 1580.  Stand-by.  Thanks.

Comment 26 Clyde E. Kunkel 2005-09-28 15:09:58 UTC

Created attachment 119372 [details]
Output of mkinitrd -f -v of 1580 img and zcat of resulting image

mkinitrd-4.2.21-1 works fine and can now boot the 1580 kernel.	The attached
file shows the output of mkinitrd -f -v and zcat of the resulting image.

Comment 27 Alexandre Oliva 2005-09-29 07:03:08 UTC

Created attachment 119401 [details]
Several fixes and improvements for mkinitrd

This patch fixes mkinitrd such that I can now boot using root on an LV whose VG
scatters across multiple RAID 1 devices.  It will correctly bring up all raid
devices.  It also fixes the swap on a different VG bug, refactoring a lot of
code to simplify the overall handling of devices, while at the same time
speeding things up significantly by avoiding repeating a lot of work that was
repeated before.

Comment 28 Peter Jones 2005-09-29 21:01:28 UTC

Thanks for the patch, it looks good.

This should be fixed in tomorrow's rawhide, but several other changes have been
merged as well, so please give it a try.

Comment 29 Justin Conover 2005-09-30 00:00:16 UTC

Pulled the updated mkinitrd from cvs, rebuilt and installed.

Removed the 82 kernel before installing mkinitrd, then yum update again to 82
and all is good ;)

thx

Comment 30 Clyde E. Kunkel 2005-09-30 13:49:13 UTC

Created attachment 119466 [details]
screenshop mkinitrd-5.0.2 failure on kernel 1586

mkinitrd-5.0.2 doesn't work for me. Still kernel panic. Attaching screenshot
and file containing output of zcat of image, output of mkinitrd of 1586, fstab,
grub.conf and e2label of root partition.

Comment 31 Clyde E. Kunkel 2005-09-30 13:51:52 UTC

Created attachment 119467 [details]
listing of init from zcat of 1586 image, mkinitrd of 1586, fstab, grub.cof, etc

Comment 32 Clyde E. Kunkel 2005-09-30 14:14:20 UTC

Created attachment 119468 [details]
Successful mkinitrd-5.0.2

Success!! Now running with 1586 kernel. Changed fstab root device entry to
/dev/md3 vice LABEL=/rawhide and same on kernel parameter in grub.conf. 
Attached file shows definite difference from previous try when fstab had
LABEL=/rawhide.  Looks like a problem with label detection when the root device
is on a raid?

Comment 33 Alexandre Oliva 2005-09-30 17:47:47 UTC

Created attachment 119477 [details]
Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev

This patch, to be applied after the one I posted before (that already got
integrated into 5.0.2, that I haven't downloaded yet), fixes LABEL= handling. 
It was broken before my previous patch, but I didn't know that, so I just left
it alone.  This seems to fix it for me, as long as /etc/blkid.tab is accurate. 
It appears to generally be, since that's what mount uses for labels anyway, but
I haven't been able to get it updated after running tune2fs to add a label to a
pre-existing device to test it.  I edited it by hand, and then tested the
patch, with LABEL= for both rootdev and swsuspdev.  It worked fine.  Except
that it's a bit unfortunate to not have the LABEL=s in the resume command. 
Does resume actually support the LABEL= notation?

Comment 34 Alexandre Oliva 2005-09-30 18:01:22 UTC

Created attachment 119479 [details]
Fix handling of LABEL= for rootdev, add handling of LABEL= for swsuspdev

This supersedes the previous patch, since I've learned that resume actually
supports LABEL=.

Comment 35 Alexandre Oliva 2005-10-01 04:12:08 UTC

Hmm...  So, it turns out that the LABEL= handling code doesn't work for him,
Clyde tells me in private.  From the bash -x output, the problem is that
/etc/blkid.tab contains labels for both the raid device *and* one of the raid
members, and this confuses the current logic.  It would be nice if
/etc/blkid.tab didn't contain such arguably incorrect info, but we might have to
code around that.  I guess it would suffice to simply bring up all devices that
match the label, and let mount device which one to use on its own.  Comments?

Comment 36 Clyde E. Kunkel 2005-10-01 15:01:16 UTC

updated mkinitrd to 5.0.3 and then ran up2date on today's rawhide offerings. 
Saw the following msgs during install of the 1588 kernel:

find: warning: Unix filenames usually don't contain slashes (though pathnames
do).  That means that '-name LABEL=/rawhide' will probably evaluate to false all
the time on this system.  You might find the '-wholename' test more useful, or
perhaps '-samefile'.  Alternatively, if you are using GNU grep, you could use
'find ... -print0 | grep -FzZ LABEL=/rawhide'.

Of course the kernel wouldn't boot.  fstab has LABEL=/rawhide.  Went back to
1586 whose initrd was created with Mr. Oliva's patches.  This initrd works as
long as you specify root=/dev/md3 in the grub.conf kernel line even if
LABEL=/rawhide is in fstab.

Comment 37 Clyde E. Kunkel 2005-10-07 17:30:30 UTC

Testing error on my part...reboot after identifying the raid root partition with
LABEL=/somelabelname in /etc/fstab, then run mkinitrd. Normal boot follows with
LABEL= in both fstab and the kernel line in grub.conf.  Not rebooting confuses
mkinitrd since the initrd image was created with the raid root partition
identified as /dev/someraidset in /etc/fstab.

Shall I close this bugzilla?  Or, does an additional test need to be added to
mkinitrd to handle the bizarre case?

Comment 38 Alexandre Oliva 2005-10-25 02:52:01 UTC

There's still a typo in /sbin/mkinitrd that causes mkinitrd to bring up *all*
volume groups, when it's trying to bring up the VG for swsuspdev:

-        handlelvordev $swsupdev
+        handlelvordev $swsuspdev

Other than that, it appears that the resume code does support labels, but
/etc/blkid.tab is not there at that time, so it doesn't work, or something along
these lines.  Is this why the current mkinitrd code doesn't even bring up
swsuspdev when it starts with LABEL=?  If so, should it not set noresume if
swsuspdev matches this pattern, so as to not issue the resume command that is
not going to work anyway?

Comment 39 Clyde E. Kunkel 2005-11-24 14:28:02 UTC

Installed FC5T1, updated system which included kernel 1707 and mkinitrd-5.0.11
and boot failed with root partition not found.  Root partition is on a logical
volume.  Uninstalled mkinitrd-5.0.11 and installed mkinitrd-5.0.10, recreated
1707 initrd and now system boots. 

Also, all VGs not being seen by system on boot.  Expect all of them to be seen.

I am trying to correct the summary of this bug also to mkinitrd prob.

Comment 40 Clyde E. Kunkel 2005-11-25 02:49:13 UTC

Had time to look at good vs bad msgs when booting mkinitrd-5.0.10 version vs
mkinitrd-5.0.11 version of 1707 initrd.  Significantly, 5.0.11 is using a
different name for the raid devices and thus the LVM cannot find LVs!!

5.0.10:
-------------------------------------------------------------------
md: considering sdd1 ...
md:  adding sdd1 ...
md:  adding sdc1 ...
md:  adding sda1 ...
md: created md0   <==============================Note device name as expected.
md: bind<sda1>
md: bind<sdc1>
md: bind<sdd1>
md: running: <sdd1><sdc1><sda1>
raid5: device sdd1 operational as raid disk 1
raid5: device sdc1 operational as raid disk 0
raid5: device sda1 operational as raid disk 2
raid5: allocated 3165kB for md0
raid5: raid level 5 set md0 active with 3 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:3 fd:0
 disk 0, o:1, dev:sdc1
 disk 1, o:1, dev:sdd1
 disk 2, o:1, dev:sda1
md: ... autorun DONE.
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
cdrom: open failed.
  Found volume group "VolGroup0" using metadata type lvm2 <=====Excellent!!!
Activating logical volumes
  8 logical volume(s) in volume group "VolGroup0" now active <==Excellent!!!

Now, 5.0.11:
----------------------------------------------------------------
md: created md_d0   <==============Not good.  Should be md0
md: bind<sda1>
md: bind<sdc1>
md: bind<sdd1>
md: running: <sdd1><sdc1><sda1>
raid5: device sdd1 operational as raid disk 1  <=====raid devices are correct
raid5: device sdc1 operational as raid disk 0  <=====/
raid5: device sda1 operational as raid disk 2  <====/
raid5: allocated 3165kB for md_d0
raid5: raid level 5 set md_d0 active with 3 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:3 fd:0
 disk 0, o:1, dev:sdc1
 disk 1, o:1, dev:sdd1
 disk 2, o:1, dev:sda1
md: ... autorun DONE.
Scanning logical volumes
  Reading all physicalcdrom: open failed.
 volumes.  This may take a while...
  No volume groups found  <===============Not good.  Due to bad raid device name
---------------------------------------------------------------------------
Should this be a separate bug?

Comment 41 Clyde E. Kunkel 2005-11-27 15:07:19 UTC

This is still a problem with mkinitrd-5.0.12.  Same problem as comment 40, no
improvement, raid device names are being changed from what LVM expects as the pv
names (I think).

What additional information do you need from me to fix this?  Please?

Comment 42 John Ellson 2005-11-27 17:08:31 UTC

See also bug #174263 - kernel panic on raid system.

Also fixed by downgrading mkinitrd to mkinitrd-5.0.10-1

Comment 43 John Ellson 2005-11-27 17:19:19 UTC

See also bug #169450

Comment 44 Dave Jones 2005-11-28 17:49:09 UTC

*** Bug 174263 has been marked as a duplicate of this bug. ***

Comment 45 Peter Jones 2005-11-29 22:54:00 UTC

I can't reproduce this with 5.0.12 .  Clyde, please do the following:

mkdir /tmp/initrd
cd /tmp/initrd
zcat /boot/$BAD_INITRD | cpio -dvi

(where $BAD_INITRD is an initrd showing md_d0 )

And then attach a copy of /tmp/initrd/init ?

Comment 46 Clyde E. Kunkel 2005-11-30 01:48:03 UTC

Created attachment 121620 [details]
init from initrd created with mkinitrd-5.0.12

As requested, init from bad initrd is attached.  Thanks for chasing this.

Comment 47 Alexandre Oliva 2005-11-30 12:01:51 UTC

I've looked a bit into this bug, as it affects all of my boxes.  Like others, I
found out that downgrading to 5.0.10 from 5.0.12, then re-creating initrd.img,
fixes the problem.  I also investigated differences between the initrd images
created by them, and the only actual difference was nash.  The generated init
script was identical.  That's as far as I got for now.

Comment 48 John Ellson 2005-12-01 15:48:07 UTC

Downgrading to mkinitrd-5.0.10 did not fix the problem on my x86_64 box.

Comment 49 Alexandre Oliva 2005-12-01 19:37:40 UTC

John Ellson, did you actually rebuild initrd after downgrading mkinitrd?

Comment 50 John Ellson 2005-12-01 20:52:51 UTC

Yes, absolutely.   I've spent a lot of time on this bug today while trying to
retest a different (but possibly related)  bug #174188 and I've been rigorous
about the procedure.

Is it possible the the downgrade trick isn't working on SMP kernels?  The x86_64
kernels are all SMP I think?

I have a i686 SMP software raid box that I'm about to try this on (as soon as I
can free it up from some real work that I have to do today ;-)

Comment 51 John Ellson 2005-12-01 21:26:09 UTC

Hmm, well lets say it absolutly didn't fix all the problem on the x86_64.

My i686 SMP box is ok with kernel-smp-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1.
So I don't think there is any evidence for my SMP theory.

My x86_64 (dual core) with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1
did manage to switchroot this time, but fails right at the end of the init
sequence with an OOPs.  (I'll attach the dmesg output next.)  The system
is usable from teh primary text console, but none of the other virtual consoles
produce a prompt, and startx result in a total system hang.

Comment 52 John Ellson 2005-12-01 21:30:35 UTC

Created attachment 121710 [details]
dmesg output on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.10-1

Comment 53 Alexandre Oliva 2005-12-01 23:10:16 UTC

If switchroot passed, then it's likely a different problem.  Reverting to
mkinitrd-5.0.10-1 and using an older kernel (say 2.6.14-1.1719_FC5.x86_64)
definitely works (except that it sometimes freezes accessing my
firewire-connected disk, but that's a different bug as well :-)

Comment 54 Clyde E. Kunkel 2005-12-02 00:44:38 UTC

Why are the raid device names being changed in mkinitrd versions after 5.0.10? 
That seems to be the key to the problem.  5.0.10 lists them as /dev/mdn while
5.0.11 and .12 list them as /dev/md_dn.

Comment 55 Clyde E. Kunkel 2005-12-02 19:18:16 UTC

Another tidbit that may or may not indicate something:  I discovered that a
failed boot with initrds created with mkinitrd-5.0.12 leave a /dev/md_d2 entry
in /dev.  Rebooting with a mkinitrd-5.0.10 initrd showed this "feature."  I
discovered it when I couldn't get /dev/md2 to start and found /dev/md_d2 in
/proc/mdstat.  I removed md_d2 from /dev and rebooted and the md_d2 node was
gone and /dev/md2 started automatically during the boot process.  I have no idea
what this means if anything, but it sure is mysterious.  

Looking ahead, when test2 comes out, how will I install it on this system with
the mkinitrd problem?

Comment 56 Clyde E. Kunkel 2005-12-16 18:21:51 UTC

Installed rawhide using boot.iso and rawhide of 16 Dec.  After overcoming
problems in BZ 174047, system would not boot because of the problems discussed
above:  raidautorun names software raidsets md_d0 and md_d1 instead of md0 and md1.

Current Rawhide is not installable on a configuration of pre-existing software
raidsets created as /dev/md[0..9].  The software raidsets are used as PVs for
LVs.  I am changing severity to high and would change priority to high if allowed.

Please tell me how to overcome this problem short of starting all over again
from scratch.  There must be a workaround.  If I could use mkinitrd-5.0.10 with
a current rawhide that would be great, but I don't know how to do that.

Comment 57 John Ellson 2005-12-16 18:30:37 UTC

rpm -Uvh --oldpackage mkinitrd-5.0.10.i386.rpm

Comment 58 Clyde E. Kunkel 2005-12-16 19:59:09 UTC

Would love to, but, how?  I can't boot.  During the install?  How would I do that?

Comment 59 John Ellson 2005-12-16 20:14:52 UTC

Try this:  boot a rescue disk, mount the installed /
(and /boot if necessary), install mkinitrd-5.0.10, then run mkinitrd
by hand to generate a new initrd image.

Or you can reinstall Fedora-4.90 and then update everything except mkinitrd
from Fedora-development.

Comment 60 Clyde E. Kunkel 2005-12-16 21:14:37 UTC

thanks for the ideas.  Starting from the FC5T1 disks occurred to me, but I 
didn't want to go thru the dependency hell during updates.  Will try the first 
method.

I can't believe you and I are the only folks experiencing this.  There must be 
a logical explanation.

Comment 61 John Ellson 2005-12-18 15:40:39 UTC

Why is mkinitrd not being fixed or rolled back to mkinitrd-5.0.10 ?

The problem is still present in: mkinitrd-5.0.13-1.1

Comment 62 Bill Marrs 2005-12-18 17:49:02 UTC

I just upgraded 2 very similar servers to 2.6.14-1.1653_FC4smp.  One came up 
fine, the other panics in boot-up with errors like:

...
md: ... autorun DONE.
Creating root device
Mounting root filesystem
EXT3-fs: unable to read superblock
mount: error 22 mounting ext3
Switching to new root
ERROR opening /dev/console!!!!: 2
error dup2'img fd of 0 to 0
error dup2'img fd of 0 to 1
error dup2'img fd of 0 to 2
unmounting old /proc
unmounting old /sys
switchroot: mount failed: 22
Kernel Panic- not syncing: Attempting to kill init!

Both servers are smp [dual PIII-800mhz], same tyan mobo, with 2 SCSI disks.  
Both are running software RAID (with similar partitions: /dev/md0 is /boot 
and /dev/md1 is /).  There are several slight variations in their installs.

Booting 2.6.14-1.1644_FC4smp on the panicking server works, so I've dropped 
back to using that.  I've looked around, but I don't see what changed in 1653 
to trigger this now.

Apologies if my issue is not related to this bug.  I suspect it is, but it's 
hard to be sure.

Comment 63 John Ellson 2005-12-18 18:01:53 UTC

Bill,

Sounds like the same bug, if it is this should work-around it:

wget
http://download.fedora.redhat.com/pub/fedora/linux/core/test/4.90/i386/os/Fedora/RPMS/mkinitrd-5.0.10-1.i386.rpm
rpm -Uvh --oldpackage mkinitrd-5.0.10-1.i386.rpm
cd /boot
mv initrd-2.6.14-1.1771_FC5.img initrd-2.6.14-1.1771_FC5.img.OLD
mkinitrd initrd-2.6.14-1.1771_FC5.img 2.6.14-1.1771_FC5

Please report back if that fixes it.

Comment 64 Bill Marrs 2005-12-18 18:08:58 UTC

I got this:

# rpm -Uvh --oldpackage mkinitrd-5.0.10-1.i386.rpm
warning: mkinitrd-5.0.10-1.i386.rpm: Header V3 DSA signature: NOKEY, key ID 
30c9ecf8
error: Failed dependencies:
        libc.so.6(GLIBC_2.4) is needed by mkinitrd-5.0.10-1.i386


Is it safe to use --nodeps?

Comment 65 John Ellson 2005-12-18 18:19:59 UTC

I wouldn't.

There have been other mkinitrd that worked.   Can you just back out the most
recent version on your system?

Or build your own mkinitrd with:
    rpmbuild --rebuild minitrd-5.0.10-1.src.rpm

Comment 66 Bill Marrs 2005-12-18 19:30:36 UTC

I have mkinitrd-4.2.15-1 installed.  It's the same version that is on the FC4 
install CD.  There haven't been any updates for it.

I guess you guys are ahead of me, testing what will be FC5?

So, either my problem is not related to the problem you have.  Or, it has been 
around for a long time.  But, the problem only just started for me with the 
latest batch of yum updates (including upgraading the kernel from 2.6.14-
1.1644_FC4smp to 2.6.14-1.1653_FC4)

Like I said, my other, similar, server booted up fine.  So, I assume it's some 
sort of variation in the config.  ... and some how the latest kernel update (or 
maybe some other recently updated package) tickled it.

Could my problem by mkinitrd if mkinitrd has not changed?

Comment 67 Alexandre Oliva 2005-12-19 13:57:59 UTC

(In reply to comment #60)
> I can't believe you and I are the only folks experiencing this.  There must be 
> a logical explanation.

I'm experiencing the same problem, FWIW, and I'm also surprised that there
aren't more people reporting the problem.  Maybe few people run the development
tree on RAID boxes, and will only run into the problem when they try to install
the next release of Fedora on their servers...  Oh well :-(

Comment 68 Clyde E. Kunkel 2005-12-20 00:57:57 UTC

Reference my comment 60:

I got the 16 Dec rawhide tree to boot by booting what I call my rescue Fedora on
the same machine, yum updating it (except for mkinitrd!), mounting the
unbootable rawhide root, and its boot partition on a tempdir, bind mounting sys,
dev and proc, chroot to the tempdir.  Then I reinstalled mkinitrd-5.0.10 and
recreated the initrd.  Worked like a charm, thanks John for the idea.  When
FC5T2 comes out, I will be able to do a test install of it using this method,
but am going to try to revert mkinitrd before first boot (HINT!!! would be nice
if mkinitrd-5.0.10 were in FC5T2)

In looking at the differences today between 5.0.10 and 5.0.13 using diff, the
most significant differences are in nash.  And in nash the most significant
differences are the new functions introduced with 5.0.11, i.e., coeOpen etc.  I
am not a programmer and am struggling to understand the code I see, but I wonder
what would happen if nash were reverted to just open, etc vice coeOpen, etc? 
The changing of the raidset device names is just a mystery that doesn't seem to
be related to kernel version or anything else except nash changes.

Comment 69 Dave Jones 2005-12-20 04:45:03 UTC

*** Bug 176179 has been marked as a duplicate of this bug. ***

Comment 70 Andy Burns 2005-12-20 12:33:41 UTC

I had reinstalled the system without raid, so re-installed it again with raid
and proved the problem still existed, booted from rescue cd, manually assembled
arrays and mounted / and /boot into /mnt/sysimage, chrooted into /mnt/sysimage,
downloaded and installed mkinitrd 5.0.10-1 then ran it as described to create a
new initrd.img (about 10K smaller than the original)

Rebooted, result is different but no better :-(

I can't seem to capture the full output on a serial console, so below is the
serial console output, and here http://adslpipe.co.uk/initrd.jpg is a screen
photo of the remainder.

Is the fstab.sys something that gets rolled into the initrd? when mounted in
rescue mode I can't see that file anywhere for it to pick up, anything else?

SCSI subsystem initialized
libata version 1.20 loaded.
ahci 0000:00:1f.2: version 1.2
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193
input: ImPS/2 Generic Wheel Mouse as /class/input/input1
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq led clo pio slum part
ata1: SATA max UDMA/133 cmd 0xF8828100 ctl 0x0 bmdma 0x0 irq 66
ata2: SATA max UDMA/133 cmd 0xF8828180 ctl 0x0 bmdma 0x0 irq 66
ata3: SATA max UDMA/133 cmd 0xF8828200 ctl 0x0 bmdma 0x0 irq 66
ata4: SATA max UDMA/133 cmd 0xF8828280 ctl 0x0 bmdma 0x0 irq 66
ata1: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023 88:207f
ata1: dev 0 ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023 88:207f
ata2: dev 0 ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: no device found (phy stat 00000000)
scsi2 : ahci
ata4: no device found (phy stat 00000000)
scsi3 : ahci
  Vendor: ATA       Model: WDC WD2500KS-00M  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: Attached scsi disk sda
  Vendor: ATA       Model: WDC WD2500KS-00M  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3 sdb4
sd 1:0:0:0: Attached scsi disk sdb
Kernel panic - not syncing: Attempted to kill init!
 [<c0121184>] panic+0x3c/0x16d     [<c0123b37>] do_exit+0x6c/0x372
 [<c0123ef4>] sys_exit_group+0x0/0xd     [<c0103f19>] syscall_call+0x7/0xb

Comment 71 Andy Burns 2005-12-20 13:03:44 UTC

Hmm, the raidautorun doesn't seem to be called at all now, /etc/mdadm.conf and
/etc/fstab look ok, the md devices get assembled if I add another entry to
grub.conf using the initrd.img.old

Nothing about raidautorun looks conditional inside rc.sysinit, not sure about
what nash actually does though ...

Comment 72 John Ellson 2005-12-20 15:02:49 UTC

None of my software raid boxes run with mkinitrd-5.0.15-1 today either,
but the problem is even worse now.  I have one i686smp box that won't
boot (same error as #70) kernel-smp-2.6.14-1.1776_FC5 with either
mkinitrd-5.0.10-1 or mkinitrd-5.0.5.1.  So perhaps the problem isn't just
mkinitrd?   

Reverting to kernel-smp-2.6.14-1.1773_FC5 with mkinitrd-5.0.10-1 brings it back up.

Comment 73 Clyde E. Kunkel 2005-12-20 17:48:43 UTC

System 1:  Root partitions on software raid.  I can boot with mkinitrd-5.0.15-1,
BUT none of the fstab raid mounts worked because they reference /dev/md0,
/dev/md1, etc and all of the raidsets got identified as /dev/md_d0, /dev/md_d1,
etc during initrd in /dev.  Root on this system is on /dev/md3, but it is
identified in fstab by label=/RAWHIDE, thus allowing the boot.  Reverting to
mkinitrd-5.0.10 didn't work either until I booted an older kernel and then
recreated the 1776 initrd.  However, still running under 1773 since there may be
a networking issue with 1776 (bug 176250).

System 2:  Root partitions on LVM on software raid5 PVs.  Didn't allow new
mkinitrd to be upgraded, so 1776 was created with existing mkinitrd-5.0.10.  Got
kernel oops on reboot, still working on that and will get a terminal trace.  May
be a different bug.

Comment 74 Andy Burns 2005-12-20 19:19:18 UTC

I've just ungzip'ed and cpio'ed the the original initrd and the one generated by
mkinitrd-5.0.10-1 and it seems that the init scripts are different

the original init has "insmod /lib/raid1" and "raidautorun /dev/md1" statements
the regenerated init has neither

coudl the fact that I ran the nkinitrd inside a chrooted rescue environment,
rather than a "proper" config e.g. with an older kernel have influenced what got
added to the init script? I'm just going to re-cpio and re-gzip a tweaked initrd
and see what I can achieve ...

Comment 75 Andy Burns 2005-12-20 19:59:19 UTC

Hurrah!

I extracted the contents of the initrd built by mkinitrd 5.0.10-1
added in the raid1.ko and the init script from the original initrd
rebuilt it with
find . | cpio -o -c | gzip > /boot/initrd.img

and the machine boots, does anyone know *what* is different about the newer
mkinitrd versions that breaks raid? is this going to be treated as a blocker for
FC5T2?

I've learned more about kernel booting than I expected ;-)

Comment 76 Andy Burns 2005-12-20 20:31:52 UTC

OK, after getting it working I decided to see if today's rawhide would break it
again ...

yum updated mkinitrd from 5.0.10-1 to 5.0.15-1
then yum updatd kernel from 2.6.14-1.1770_FC5smp to 2.6.14-1.1776_FC5smp

I assume updateing the kernel builds a new initrd rather than installing a
standard one?

Either way, with the new kernel the machine is back to panicing again as the
arrays *ARE* being seen as /dev/md_d0 and md_d1 instead of /dev/md0 and md1

<panto>He's behind you :-)</panto>

Comment 77 Peter Jones 2005-12-20 21:04:10 UTC

> the original init has "insmod /lib/raid1" and "raidautorun /dev/md1"
> statements the regenerated init has neither

This indicates that the root filesystem wasn't mounted on the raid...  If you
run mkinitrd with -v, what's the output say?

> coudl the fact that I ran the nkinitrd inside a chrooted rescue environment,
> rather than a "proper" config e.g. with an older kernel have influenced what
> got added to the init script? I'm just going to re-cpio and re-gzip a tweaked
> initrd and see what I can achieve ...

Yeah, that could make a big difference if the rescue image doesn't have the same
fstab, /dev/root, etc set up, or if /sys/block somehow came out significantly
different.  In which case this is a problem with the rescue environment's setup.

Comment 78 Andy Burns 2005-12-20 21:18:37 UTC

> This indicates that the root filesystem wasn't mounted on the raid...  
> If you run mkinitrd with -v, what's the output say?

Can't say, that incarnation of the machine is long gone ;-) 

But when I upgraded to mkinit-5.0.15-1 and the upgraded the kernel to 1776 it
*did* correctly add raid1.ko into the initrd, so I think it happenned *because*
I used mkinitrd from the rescue cd.

> Yeah, that could make a big difference if the rescue image doesn't 
> have the same fstab, /dev/root, etc set up, or if /sys/block somehow 
> came out significantly different.  

At the time I didn't have any choice as the machine didn't have an older kernel
installed so I had to use the rescue cd. Even though I was chrooted into
/mnt/sysimage where my /dev/mdX were mounted can the mkinitrd "see through" the
chroot and be looking at the rescue cd's *real* root, which obviously wouldn't
be raid? I didn't remount anything like /proc or /sys inside the chroot
environment, don't know if that would have helped or hindered ...

> In which case this is a problem with the rescue environment's setup.

Worthy of a separate BZ entry? Or is that "expected" given my explanation above

Comment 79 Peter Jones 2005-12-20 21:57:46 UTC

I don't have any idea who does the rescue image.  So yeah, if mkinitrd isn't
working because the rescue image isn't setting things up right, that probably
merits a bz entry against it.

Comment 80 Clyde E. Kunkel 2005-12-21 04:46:24 UTC

Created attachment 122479 [details]
console output of kernel 1776 oops

Comment 81 Clyde E. Kunkel 2005-12-21 04:48:43 UTC

Completing comment #73 WRT System2:

Kernel oops regardless of mkinitrd-5.0.10 or .15.  Console listing attached.

Comment 82 Clyde E. Kunkel 2005-12-21 05:11:53 UTC

Should have tried acpi=off before I cluttered up this BZ.  Kernel 1776 boots
with mkinitrd-5.0.10 ok on System 2 (LV on raid5 PV).  Mkinitrd-5.0.15 does not
boot because of raidsets being renamed.

Comment 83 Peter Jones 2005-12-21 20:41:08 UTC

OK, I think I know what's happening, but it's a kernel issue.  As such, I
reassigned bz #176179 to kernel, as it's a much shorter and easier to follow
version of the same issue, and I'm going to close this bug as a dupe.

*** This bug has been marked as a duplicate of 176179 ***

Comment 84 Paul Flinders 2006-01-02 20:51:39 UTC

See my comments over on bug 176179 - I've been having what seems to be the same
bug and, having finally found time to look at the problem I'd say it was a nash
bug, not a kernel bug.

The following patch to mkinitrd fixes this problem *for*me* - it reverses a
change which occurred sometime between -10 and -15. 

--- mkinitrd-5.0.15/nash/nash.c.orig    2005-12-19 19:22:59.000000000 +0000
+++ mkinitrd-5.0.15/nash/nash.c 2006-01-02 20:13:46.000000000 +0000
@@ -1059,7 +1059,7 @@
        return 1;
     }

-    if (ioctl(fd, RAID_AUTORUN)) {
+    if (ioctl(fd, RAID_AUTORUN, 0)) {
        eprintf("raidautorun: RAID_AUTORUN failed: %s\n", strerror(errno));
        close(fd);
        return 1;

RAID_AUTORUN expects an argument & this is specifically whether it should set up
partitioned MD devices. 

Not supplying the argument in nash has to be wrong and the expected result
exactly fits the observed symptoms.

Supplying a fixed argument of 0 is probably wrong as well - it should depend on
whether mkinitrd finds partioned RAID devices.

Note You need to log in before you can comment on or make changes to this bug.