Bug 192157

Summary: kernel 2.6.16-1.2204_FC6 refuses to boot - at least on x86_64
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED DUPLICATE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: andreas.ossenbrueggen, davej, hoover, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-16 21:17:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2006-05-17 22:52:16 UTC
Description of problem:

The moment user-space is suppsed to start init goes into an infinite loop
pinting repeteadly and in a high rate:

init[1] trap divide error rip: 42a1b7 rsp: 7fffb7574940 error: 0

Kernel nominally did not crash as alt-ctrl-del reboots machine but
with an older kernel the same installation still boots.

Things happens so quickly that I am not sure which init really gets
into trouble but presumably this is /sbin/init from SysVinit-2.86-4.

Version-Release number of selected component (if applicable):
2.6.16-1.2204_FC6

How reproducible:
always

Comment 1 Michal Jaegermann 2006-05-18 18:58:31 UTC
The problem persist with 2.6.16-1.2206_FC6.  A kernel which I can boot is
2.6.16-1.2074_FC6 (I could not touch my test box for a rather long time).

Comment 2 Michal Jaegermann 2006-05-19 20:01:29 UTC
The same problem with 2.6.16-1.2207_FC6.  Things happen really
quickly but it appears that 'trap divide error' shows up in initrd,
although I am not really sure.  OTOH redoing fresh initrd for one
of my still booting kernels results in a bootable system.

I will skip further comments until a new kernel with which I can boot.

Comment 3 Michal Jaegermann 2006-05-24 19:03:28 UTC
I took apart initrd and inserted a bunch of 'echo <something>' and
sleep statements to see where things go haywire.  There is a fragment
there which goes:
....
insmod /lib/dm-snapshot.ko
mkblkdevs
rmparts sdb
dm create pdc_cjfeejidea 0 488397056 linear 8:16 0
....
The moment 'dm create ...' is called I am in a "trap divide error" loop.

OTOH when I commented out in 'init' from initrd the following block
('rmparts sdb' and 'rmparts sdc' can be left uncommented):

rmparts sdb
dm create pdc_cjfeejidea 0 488397056 linear 8:16 0
dm partadd pdc_cjfeejidea
rmparts sdc
dm create pdc_cjhbfdhhaa 0 488397056 linear 8:32 0
dm partadd pdc_cjhbfdhhaa

which is followed by (left intact):

resume /dev/sda5
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro sda11
echo Mounting root filesystem.
mount /sysroot
echo Setting up other filesystems.
setuproot
echo Switching to new root and running init.
switchroot

then I can boot 2.6.16-1.2211_FC6 and I am running it right now.

The point is that I can create now, without any manual intervention,
an initrd for 2.6.16-1.2074_FC6 and it works.  So something happened
in the meantime which changes mutual expectations between kernel and dm.

'rmparts' and 'dm' are clearly internal 'nash' commands, although
they are not documented as such, and if I will feed a commented out
fragment to 'nash' under gdb, and with mkinitrd-debuginfo installed,
then I see the following:

Program received signal SIGFPE, Arithmetic exception.
0x000000000042a1b7 in _device_probe_geometry ()
(gdb) where
#0  0x000000000042a1b7 in _device_probe_geometry ()
#1  0x000000000042a3ab in init_generic ()
#2  0x000000000042a82f in linux_new ()
#3  0x000000000041745c in ped_device_get ()
#4  0x0000000000401733 in nashDmCreatePartitions (
    path=0x66e940 "/dev/mapper/pdc_cjfeejidea") at dm.c:287
#5  0x000000000040827c in dmCommand (
    cmd=0x66ea39 "rmparts sdc\ndm create pdc_cjhbfdhhaa 0 488397056 linear 8:32
0\ndm partadd pdc_cjhbfdhhaa\n", end=0x66ea38 "") at nash.c:1799
#6  0x0000000000408ee8 in runStartup (fd=6, name=0x7fff60e20bce "bomb.sh")
    at nash.c:2188
#7  0x0000000000409271 in main (argc=1, argv=0x7fff60e1f7f0) at nash.c:2290
#8  0x000000000047ca80 in __libc_start_main ()
#9  0x00000000004001b9 in _start ()
#10 0x00007fff60e1f7d8 in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb)

'mkinitrd-debuginfo' unfortunately does not provide more than that.

So ultimately this is fault of nash not following kernel changes or
of kernel which pulls a carpet under nash?

Comment 4 Michal Jaegermann 2006-05-25 20:38:34 UTC
After the most recent updates (kernel-2.6.16-1.2215_FC6 and
mkinitrd-5.0.41-1) I was able to boot again without modifications
to initrd.

Two things happened.  One is that incriminated fragment of 'init' script,
i.e.

rmparts sdb
dm create pdc_cjfeejidea 0 488397056 linear 8:16 0
dm partadd pdc_cjfeejidea
rmparts sdc
dm create pdc_cjhbfdhhaa 0 488397056 linear 8:32 0
dm partadd pdc_cjhbfdhhaa

does not show up anymore.  That means that a corresponding fragment of
my new script now looks like follows:
.....
insmod /lib/dm-snapshot.ko
mkblkdevs
resume /dev/sda5
echo Creating root device.
.....

No idea if this is good or bad.  I still cannot get lvm2 to do something
with /dev/sdb and /dev/sdc devices does not matter what (cf. bug #176623).

The second thing is that feeding nash those "lost lines" is not causing
SIGFPE anymore.  It just returns and I failed to observe any other effects.

Comment 5 Dwaine Garden 2006-06-26 05:47:05 UTC
Here is a good initrd init script and a bad one....  When I add the lines
missing, the kernel finds the raid0 and boots ok.

Good....

#!/bin/nash

mount -t proc /proc /proc
setquiet
echo Mounting proc filesystem
echo Mounting sysfs filesystem
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mkdir /dev/pts
mount -t devpts -o gid=5,mode=620 /dev/pts /dev/pts
mkdir /dev/shm
mkdir /dev/mapper
echo Creating initial device nodes
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mknod /dev/systty c 4 0
mknod /dev/tty c 5 0
mknod /dev/console c 5 1
mknod /dev/ptmx c 5 2
mknod /dev/rtc c 10 135
mknod /dev/tty0 c 4 0
mknod /dev/tty1 c 4 1
mknod /dev/tty2 c 4 2
mknod /dev/tty3 c 4 3
mknod /dev/tty4 c 4 4
mknod /dev/tty5 c 4 5
mknod /dev/tty6 c 4 6
mknod /dev/tty7 c 4 7
mknod /dev/tty8 c 4 8
mknod /dev/tty9 c 4 9
mknod /dev/tty10 c 4 10
mknod /dev/tty11 c 4 11
mknod /dev/tty12 c 4 12
mknod /dev/ttyS0 c 4 64
mknod /dev/ttyS1 c 4 65
mknod /dev/ttyS2 c 4 66
mknod /dev/ttyS3 c 4 67
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading libata.ko module"
insmod /lib/libata.ko
echo "Loading sata_via.ko module"
insmod /lib/sata_via.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko
echo "Loading dm-mirror.ko module"
insmod /lib/dm-mirror.ko
echo "Loading dm-zero.ko module"
insmod /lib/dm-zero.ko
echo "Loading dm-snapshot.ko module"
insmod /lib/dm-snapshot.ko
echo Making device-mapper control node
mkdmnod
mkblkdevs
rmparts sdb
rmparts sda
dm create via_ecfdfiehfa 0 312499998 striped 2 128 8:0 0 8:16 0
dm partadd via_ecfdfiehfa
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure  VolGroup00
resume /dev/VolGroup00/LogVol01
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/VolGroup00/LogVol00
echo Mounting root filesystem.
mount /sysroot
echo Setting up other filesystems.
setuproot
echo Switching to new root and running init.
switchroot

Bad.........  PAE kernels.

#!/bin/nash

mount -t proc /proc /proc
setquiet
echo Mounting proc filesystem
echo Mounting sysfs filesystem
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mkdir /dev/pts
mount -t devpts -o gid=5,mode=620 /dev/pts /dev/pts
mkdir /dev/shm
mkdir /dev/mapper
echo Creating initial device nodes
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mknod /dev/systty c 4 0
mknod /dev/tty c 5 0
mknod /dev/console c 5 1
mknod /dev/ptmx c 5 2
mknod /dev/rtc c 10 135
mknod /dev/tty0 c 4 0
mknod /dev/tty1 c 4 1
mknod /dev/tty2 c 4 2
mknod /dev/tty3 c 4 3
mknod /dev/tty4 c 4 4
mknod /dev/tty5 c 4 5
mknod /dev/tty6 c 4 6
mknod /dev/tty7 c 4 7
mknod /dev/tty8 c 4 8
mknod /dev/tty9 c 4 9
mknod /dev/tty10 c 4 10
mknod /dev/tty11 c 4 11
mknod /dev/tty12 c 4 12
mknod /dev/ttyS0 c 4 64
mknod /dev/ttyS1 c 4 65
mknod /dev/ttyS2 c 4 66
mknod /dev/ttyS3 c 4 67
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading libata.ko module"
insmod /lib/libata.ko
echo "Loading sata_via.ko module"
insmod /lib/sata_via.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko
echo "Loading dm-mirror.ko module"
insmod /lib/dm-mirror.ko
echo "Loading dm-zero.ko module"
insmod /lib/dm-zero.ko
echo "Loading dm-snapshot.ko module"
insmod /lib/dm-snapshot.ko
echo Making device-mapper control node
mkdmnod
echo Attaching to iSCSI storage
mkblkdevs
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure  VolGroup00
resume /dev/VolGroup00/LogVol01
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/VolGroup00/LogVol00
echo Mounting root filesystem.
mount /sysroot
echo Setting up other filesystems.
setuproot
echo Switching to new root and running init.
switchroot


Comment 6 Dwaine Garden 2006-06-26 05:50:54 UTC
Here is the output of mkinitrd -v -f /tmp/foo.img $(uname -r) 

Creating initramfs
Looking for deps of module sata_via: libata scsi_mod
Looking for deps of module libata: scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module sd_mod: scsi_mod
Looking for deps of module ide-disk
Looking for deps of module ext3: jbd
Looking for deps of module jbd
Looking for driver for device mapper/via_ecfdfiehfap2
Looking for deps of module dm-mod
Looking for deps of module dm-mirror: dm-mod
Looking for deps of module dm-zero: dm-mod
Looking for deps of module dm-snapshot: dm-mod
Using modules: /lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/scsi_mod.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/sd_mod.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/libata.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/sata_via.ko 
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/fs/jbd/jbd.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/fs/ext3/ext3.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-mod.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-mirror.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-zero.ko
/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-snapshot.ko
/sbin/nash -> /tmp/initrd.qW9233/bin/nash
/sbin/insmod.static -> /tmp/initrd.qW9233/bin/insmod
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/scsi_mod.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/scsi_mod.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/sd_mod.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/sd_mod.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/libata.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/libata.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/scsi/sata_via.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/sata_via.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/fs/jbd/jbd.ko' [elf32-i386]
to `/tmp/initrd.qW9233/lib/jbd.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/fs/ext3/ext3.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/ext3.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-mod.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/dm-mod.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-mirror.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/dm-mirror.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-zero.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/dm-zero.ko' [elf32-i386]
copy from `/lib/modules/2.6.17-1.2307_FC6PAE/kernel/drivers/md/dm-snapshot.ko'
[elf32-i386] to `/tmp/initrd.qW9233/lib/dm-snapshot.ko' [elf32-i386]
/sbin/lvm.static -> /tmp/initrd.qW9233/bin/lvm
/etc/lvm -> /tmp/initrd.qW9233/etc/lvm
`/etc/lvm/lvm.conf' -> `/tmp/initrd.qW9233/etc/lvm/lvm.conf'
Adding module scsi_mod
Adding module sd_mod
Adding module libata
Adding module sata_via
Adding module jbd
Adding module ext3
Adding module dm-mod
Adding module dm-mirror
Adding module dm-zero
Adding module dm-snapshot


Comment 7 Bill Hoover 2006-06-26 17:07:46 UTC
Just to add an additional datapoint to this - I just installed a new AMD64 box
in 64 bit mode using the FC5 respin from Fedora Unity.  The machine is using
nvidia SATA raid and the RAID-1 set showed up using device mapper.  With the
original installation everything worked fine.  Then I yum updated to all of the
latest packages.  On a boot with the new kernel (2.6.17-1.2139_FC5) it got the
same trap divide errors as listed in this bug.  Booting the updated box with the
older kernel (2.6.16-1.2122_FC5) works fine.  (I had seen this same behavior
with a 32 bit install, but just resinstalled it all now in 64 bit to check)

Comment 8 Michal Jaegermann 2006-06-26 18:38:15 UTC
> On a boot with the new kernel (2.6.17-1.2139_FC5) it got the
> same trap divide errors as listed in this bug

AFAICT this is not a kernel problem but of initrd.  More precisely,
of nash which is used by initrd.  Did you update 'mkinitrd' package
before installing 2.6.17-1.2139_FC5?

FWIW I have here an x86_64 machine running 2.6.17-1.2139_FC5 right now;
but it is not using RAID.

Last time I checked nash in 'mkinitrd' from the current rawhide did
not suffer from this problem.  Quite possibly that mkinitrd will work
on an FC5 installation but I do not know that for sure (maybe recompilation
is needed?).  You may try but make sure that you can back off.

To re-state the obvious: it is not enough to replace 'mkinitrd'; you have
also redo initrd which gives you troubles.

Comment 9 Bill Hoover 2006-06-26 18:42:04 UTC
My point is that in my case I am only installing the released updates from the
FC5 updates using yum and this kernel has this problem.  I don't know how the
release system built the kernel.  I have not built anything myself.

Comment 10 Peter Jones 2006-08-16 21:17:39 UTC

*** This bug has been marked as a duplicate of 199224 ***