Bug 197536

Summary: "trap divide error" x86_64 on kernel 2.6.17 with FC5
Product: [Fedora] Fedora Reporter: Clint Goudie <clint>
Component: mkinitrdAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: medium    
Version: 5CC: cweyl, ekolve, fxcoudert, jarod, keith, kengert, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 5.0.32-2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-19 20:24:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clint Goudie 2006-07-03 20:45:10 UTC
Description of problem:

I have a Tyan Transport GX28 (B2881) ( 
http://www.tyan.com/products/html/gx28b2881.html ) system with 2 dual core 
opteron 285 processors and 4gb of ecc ram. (2 1gb chips on each proc)

I recently got the kernel-2.6.17-1.2139_FC5.x86_64.rpm kernel via yum update.

When I boot with this updated kernel, (as closely as I can tell) right after 
the boot process has detected the disks, I see scrolling by as quickly as 
possible:

init[1] trap divide error rip:4296d7 rsp:7fff91edd2b0 error:0

as far as I can tell, this error line never changes, it just keeps scrolling as 
quickly as possible.


Version-Release number of selected component (if applicable):
2.6.17 FC5

How reproducible:
Stops system every boot. 

Steps to Reproduce:
1. On a dual opteron server, running fc5, yum update and reboot
  
Actual results:
Error scrolls by

Expected results:
Normal boot

Additional info:
I did a google on the web for init[1] trap divide error rip:4296d7

It looks like I dont have the only machine with this problem:

This post has someone receiving a very similar error running 2.6.17 on a dual 
opteron server
http://forums.fedoraforum.org/showthread.php?t=114304&goto=nextoldest

This similar post appears to be from the linux kernel archive, from someone 
running FC5, x86_64, and trying to run the 2.6.17 kernel

http://www.uwsg.iu.edu/hypermail/linux/kernel/0606.2/1037.html

If it's related to the disks, I'm using the on board SATA raid in a raid 1 
mirror of 2 10krpm drives.

I'll be happy to provide any additional information you may need to diagnose 
this issue, just let me know.

Thanks for your help,

Clint Goudie-Nice

Comment 1 Clint Goudie 2006-07-03 20:48:31 UTC
Also, the box boots fine with 2.6.16 2111

Comment 2 Adam Serbinski 2006-07-05 17:54:30 UTC
Its fakeraid, and is isolated to Fedora.

Temporary workaround;
Extract /boot/initrd-2.6.17-whatever.img
edit init to comment out the line starting with "dm partadd",
put it back together.

Then use fakeraid the old way.

Comment 3 Adam Serbinski 2006-07-05 17:59:29 UTC
If you're booting off the fakeraid, this probably won't work.

Comment 4 Clint Goudie 2006-07-06 14:56:09 UTC
I just want to confirm by by fakeraid, you mean the onboard sata raid. 
If you mean linux software raid, I'm not using the software raid.

Yes, unfortunately in this case I'm booting from it.



Comment 5 Adam Serbinski 2006-07-06 16:16:18 UTC
Confirmed. Called fakeraid because those onboard raid cards are marketed as
"hardware raid", but there is no hardware involved. They're just regular
controllers with a bios tag that will enable their software raid drivers (for
winsucks). Additionally, they tag the DRIVES to identify the type of raid that
they are. You can actually take the fakeraid array and plug them into a regular
(non-RAID) controller, and dmraid will still work for them.

Comment 6 FX 2006-07-11 11:25:25 UTC
I'm seing this too (with 2.6.17-1.2145_FC5).

Comment 7 Keith Hunt 2006-07-27 12:52:37 UTC
Also a problem for me with 2.6.17-1.2157_FC5.

Comment 8 Keith Hunt 2006-08-07 15:27:21 UTC
Tried 2.6.17-1.2159_FC5 from testing repo. This one produces the same error.

Comment 9 Keith Hunt 2006-08-10 19:54:54 UTC
Also no luck with kernel-2.6.17-1.2174_FC5

Comment 10 Chris Weyl 2006-08-31 17:02:33 UTC
Exact same error for me on a dell e510:  x86_64 (dual core, intel pentium 4),
fc5, dual SATA drives (hardware mirrored).

The last kernel I could boot to without this (and still boot to) is
2.6.16-1.2122_FC5.

Comment 11 Chris Weyl 2006-08-31 17:04:27 UTC
*** Bug 203016 has been marked as a duplicate of this bug. ***

Comment 12 Chris Weyl 2006-08-31 17:08:40 UTC
This seems like a pretty high-impact bug that's been around for a while now. 
Are there any plans to address it in a FC5 errata?

(and I amend my comment #10 to say it sounds like fakeraid on my end.)

Comment 13 Keith Hunt 2006-09-19 17:56:33 UTC
Tried kernel-2.6.17-1.2187_FC5 with the same results.

Does nobody care about this bug? Is it ever going to be fixed?

Comment 14 Kai Engert (:kaie) (inactive account) 2006-09-19 18:16:57 UTC
I had reported a similar (or same) problem in bug 203016.

I retested with 2.6.17-1.2187_FC5, and my machine now boots up ok, problem is
fixed for me.

Comment 15 Jarod Wilson 2006-09-19 18:31:55 UTC
In reply to comment #13, yes, we do care, and it ought to get fixed at some
point, but unfortunately, there are a lot more bugs than there are devs to look
at them.

Per comment #14, I'm not sure why one system works now and the other doesn't. My
only suspicion would be that its actually mkinitrd that is somehow at fault, and
the two systems aren't (or weren't at initrd creation time) running the same
version of mkinitrd. Can folks report back with their mkinitrd versions?

Keith, if you could, also try rebuilding your mkinitrd to eliminate the
possibility of the kernel being updated before mkinitrd, and/or try again after
upgrading mkinitrd, if an update is available. You might also want to try the
development tree version... Mostly just shooting in the dark here, but hoping to
hit something. :)


Comment 16 Kai Engert (:kaie) (inactive account) 2006-09-19 20:09:47 UTC
[root@kaiefast ~]# rpm -qi mkinitrd
Name        : mkinitrd                     Relocations: (not relocatable)
Version     : 5.0.32                            Vendor: Red Hat, Inc.
Release     : 2                             Build Date: Wed 30 Aug 2006 09:06:55
PM CEST
Install Date: Tue 12 Sep 2006 03:28:43 PM CEST      Build Host:
ls20-bc2-14.build.redhat.com


[root@kaiefast ~]# uname -a
Linux kaiefast 2.6.17-1.2187_FC5 #1 SMP Mon Sep 11 01:16:59 EDT 2006 x86_64
x86_64 x86_64 GNU/Linux


[root@kaiefast ~]# egrep -i "kernel|initrd" /var/log/yum.log
Sep 12 15:28:44 Updated: mkinitrd.x86_64 5.0.32-2
Sep 16 03:34:38 Installed: kernel.x86_64 2.6.17-1.2187_FC5


Comment 17 Keith Hunt 2006-09-19 20:16:08 UTC
Jarod, it seems that you did hit something. You must have extra special night
vision. 

I upgraded mkinitrd to 5.0.32-2 (I was at 5.0.32-1) and recreated the image with
that one. My machine boots up OK now. Hooray and thanks.

Comment 18 Jarod Wilson 2006-09-19 20:24:43 UTC
Go, go, magic night-vision goggles... Glad to hear that did the trick! :)

I'll twiddle the bug a bit -- reassign it to mkinitrd and close as
CURRENTRELEASE = 5.0.32-2.