Bug 197536 - "trap divide error" x86_64 on kernel 2.6.17 with FC5
"trap divide error" x86_64 on kernel 2.6.17 with FC5
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: mkinitrd (Show other bugs)
5
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
: 203016 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-03 16:45 EDT by Clint Goudie
Modified: 2007-11-30 17:11 EST (History)
7 users (show)

See Also:
Fixed In Version: 5.0.32-2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-19 16:24:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Clint Goudie 2006-07-03 16:45:10 EDT
Description of problem:

I have a Tyan Transport GX28 (B2881) ( 
http://www.tyan.com/products/html/gx28b2881.html ) system with 2 dual core 
opteron 285 processors and 4gb of ecc ram. (2 1gb chips on each proc)

I recently got the kernel-2.6.17-1.2139_FC5.x86_64.rpm kernel via yum update.

When I boot with this updated kernel, (as closely as I can tell) right after 
the boot process has detected the disks, I see scrolling by as quickly as 
possible:

init[1] trap divide error rip:4296d7 rsp:7fff91edd2b0 error:0

as far as I can tell, this error line never changes, it just keeps scrolling as 
quickly as possible.


Version-Release number of selected component (if applicable):
2.6.17 FC5

How reproducible:
Stops system every boot. 

Steps to Reproduce:
1. On a dual opteron server, running fc5, yum update and reboot
  
Actual results:
Error scrolls by

Expected results:
Normal boot

Additional info:
I did a google on the web for init[1] trap divide error rip:4296d7

It looks like I dont have the only machine with this problem:

This post has someone receiving a very similar error running 2.6.17 on a dual 
opteron server
http://forums.fedoraforum.org/showthread.php?t=114304&goto=nextoldest

This similar post appears to be from the linux kernel archive, from someone 
running FC5, x86_64, and trying to run the 2.6.17 kernel

http://www.uwsg.iu.edu/hypermail/linux/kernel/0606.2/1037.html

If it's related to the disks, I'm using the on board SATA raid in a raid 1 
mirror of 2 10krpm drives.

I'll be happy to provide any additional information you may need to diagnose 
this issue, just let me know.

Thanks for your help,

Clint Goudie-Nice
Comment 1 Clint Goudie 2006-07-03 16:48:31 EDT
Also, the box boots fine with 2.6.16 2111
Comment 2 Adam Serbinski 2006-07-05 13:54:30 EDT
Its fakeraid, and is isolated to Fedora.

Temporary workaround;
Extract /boot/initrd-2.6.17-whatever.img
edit init to comment out the line starting with "dm partadd",
put it back together.

Then use fakeraid the old way.
Comment 3 Adam Serbinski 2006-07-05 13:59:29 EDT
If you're booting off the fakeraid, this probably won't work.
Comment 4 Clint Goudie 2006-07-06 10:56:09 EDT
I just want to confirm by by fakeraid, you mean the onboard sata raid. 
If you mean linux software raid, I'm not using the software raid.

Yes, unfortunately in this case I'm booting from it.

Comment 5 Adam Serbinski 2006-07-06 12:16:18 EDT
Confirmed. Called fakeraid because those onboard raid cards are marketed as
"hardware raid", but there is no hardware involved. They're just regular
controllers with a bios tag that will enable their software raid drivers (for
winsucks). Additionally, they tag the DRIVES to identify the type of raid that
they are. You can actually take the fakeraid array and plug them into a regular
(non-RAID) controller, and dmraid will still work for them.
Comment 6 FX 2006-07-11 07:25:25 EDT
I'm seing this too (with 2.6.17-1.2145_FC5).
Comment 7 Keith Hunt 2006-07-27 08:52:37 EDT
Also a problem for me with 2.6.17-1.2157_FC5.
Comment 8 Keith Hunt 2006-08-07 11:27:21 EDT
Tried 2.6.17-1.2159_FC5 from testing repo. This one produces the same error.
Comment 9 Keith Hunt 2006-08-10 15:54:54 EDT
Also no luck with kernel-2.6.17-1.2174_FC5
Comment 10 Chris Weyl 2006-08-31 13:02:33 EDT
Exact same error for me on a dell e510:  x86_64 (dual core, intel pentium 4),
fc5, dual SATA drives (hardware mirrored).

The last kernel I could boot to without this (and still boot to) is
2.6.16-1.2122_FC5.
Comment 11 Chris Weyl 2006-08-31 13:04:27 EDT
*** Bug 203016 has been marked as a duplicate of this bug. ***
Comment 12 Chris Weyl 2006-08-31 13:08:40 EDT
This seems like a pretty high-impact bug that's been around for a while now. 
Are there any plans to address it in a FC5 errata?

(and I amend my comment #10 to say it sounds like fakeraid on my end.)
Comment 13 Keith Hunt 2006-09-19 13:56:33 EDT
Tried kernel-2.6.17-1.2187_FC5 with the same results.

Does nobody care about this bug? Is it ever going to be fixed?
Comment 14 Kai Engert (:kaie) 2006-09-19 14:16:57 EDT
I had reported a similar (or same) problem in bug 203016.

I retested with 2.6.17-1.2187_FC5, and my machine now boots up ok, problem is
fixed for me.
Comment 15 Jarod Wilson 2006-09-19 14:31:55 EDT
In reply to comment #13, yes, we do care, and it ought to get fixed at some
point, but unfortunately, there are a lot more bugs than there are devs to look
at them.

Per comment #14, I'm not sure why one system works now and the other doesn't. My
only suspicion would be that its actually mkinitrd that is somehow at fault, and
the two systems aren't (or weren't at initrd creation time) running the same
version of mkinitrd. Can folks report back with their mkinitrd versions?

Keith, if you could, also try rebuilding your mkinitrd to eliminate the
possibility of the kernel being updated before mkinitrd, and/or try again after
upgrading mkinitrd, if an update is available. You might also want to try the
development tree version... Mostly just shooting in the dark here, but hoping to
hit something. :)
Comment 16 Kai Engert (:kaie) 2006-09-19 16:09:47 EDT
[root@kaiefast ~]# rpm -qi mkinitrd
Name        : mkinitrd                     Relocations: (not relocatable)
Version     : 5.0.32                            Vendor: Red Hat, Inc.
Release     : 2                             Build Date: Wed 30 Aug 2006 09:06:55
PM CEST
Install Date: Tue 12 Sep 2006 03:28:43 PM CEST      Build Host:
ls20-bc2-14.build.redhat.com


[root@kaiefast ~]# uname -a
Linux kaiefast 2.6.17-1.2187_FC5 #1 SMP Mon Sep 11 01:16:59 EDT 2006 x86_64
x86_64 x86_64 GNU/Linux


[root@kaiefast ~]# egrep -i "kernel|initrd" /var/log/yum.log
Sep 12 15:28:44 Updated: mkinitrd.x86_64 5.0.32-2
Sep 16 03:34:38 Installed: kernel.x86_64 2.6.17-1.2187_FC5
Comment 17 Keith Hunt 2006-09-19 16:16:08 EDT
Jarod, it seems that you did hit something. You must have extra special night
vision. 

I upgraded mkinitrd to 5.0.32-2 (I was at 5.0.32-1) and recreated the image with
that one. My machine boots up OK now. Hooray and thanks.
Comment 18 Jarod Wilson 2006-09-19 16:24:43 EDT
Go, go, magic night-vision goggles... Glad to hear that did the trick! :)

I'll twiddle the bug a bit -- reassign it to mkinitrd and close as
CURRENTRELEASE = 5.0.32-2.

Note You need to log in before you can comment on or make changes to this bug.