Bug 87550

Summary: Intermittent lockups during kernel load with SuperMicro P4DPE systemboard, Intel E7500 chipset
Product: [Retired] Red Hat Linux Reporter: Jesse Keating <jkeating>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: nudea, pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:40:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jesse Keating 2003-03-28 17:54:52 UTC
Description of problem:
Every 10~20 reboots of a system using this motherboard, will result in a lockup,
just after "VFS: Diskquotas version dquot_6.5.0 initialized" is printed to the
screen.  A subsequent reboot will result in a fully functional system.  We're
experiencing this type of lockup across many machines, all over the place. I
don't see it being an environmental issue, nor is it isolated to one particular
machine. Last report from our client is out of 50 of these systems, 6 hung on
the last time they rebooted.

We also see this across various kernels.  All the latest Red Hat supplied
kernels, and various 2.5 kernels.

We have tried disabling ACPI or APIC in the bios, but it did not help.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Install Red Hat Linux
2. Reboot 10~15 times
    

Actual Results:  The system will lockup just after displaying "VFS: Diskquotas
version dquot_6.5.0 initialized"

Expected Results:  The system would finish loading the kernel, and boot the OS.

Additional info:

Any help you can give would be appreciated.  I'll be contacting both SuperMicro
and Intel about this issue, and I'll share any relevant information.

Comment 1 Jesse Keating 2003-03-29 05:05:21 UTC
Well, I've been able to duplicate the problem on another Supermicro board, this
one using the E7501 chipset.  I've tried both with Hyperthreading off and on,
both result in lockups.  With Hyperthreading disabled, it will stop just after
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16) (or something close to
that, pulled from another system).

I do not get this problem if I use the Uniproc kernel.  It also duplicates in
Red Hat 7.3 with SMP kernels.

Comment 2 Jesse Keating 2003-04-03 01:08:42 UTC
Further info.  Intel tech claims to have seen this on a Supermicro board he had,
but "fixed" it by disabling hyperthreading.  They've been unable to duplicate
the problem on any board they have there.

We got in an Intel SE 7500CW2 board, and after 240 reboots, it locked up, but
never again (200+ reboots after).  We've also got yet another Gigabyte dual xeon
board, w/ the E7500 chipset, no lockups for it either after 100 or so reboots.

Supermicro is playing dumb, saying they have never heard of it, nor can they
duplicate it, so we are going to build a system for them that we can reliably
duplicate the problem with (like every 10~15 reboots) and ship it to them.  This
should happen later this week.  I'll update the bug if I get any more info.

Comment 3 Jesse Keating 2003-11-01 18:44:27 UTC
Ok, after further (much further) investigation, this seems to be an
issue with 3ware cards and flashing their bios.  If we flash the 3ware
card bios, the system will then randomly lock up at this point.  The
resolution is to flash the card, remove the card from the PCI bus,
boot the system and shut it down, re-install the card, then everything
works.  Talk about a weird situation....

Comment 4 Jesse Keating 2003-11-12 19:27:54 UTC
This seemed to re-appear, and I think a co-worker of mine, Robin
Battey, has tracked down the error.  In drivers/char/pc_keyb.c, line
1216, there is a statement:         
kbd_write_command(KBD_CCMD_MOUSE_DISABLE); /* Disable aux device. */

And if we change this to say:

kbd_write_command_w(KBD_CCMD_MOUSE_DISABLE); /* Disable aux device. */

We don't get the error.  Perhaps this is only a workaround, or maybe
it was a typo in the source, either way this resolves our issue. 
Please advise.


Comment 5 Jesse Keating 2004-04-28 17:07:03 UTC
Dave, which errata fixes this?  We're now seeing this on some X5DPR-8
boards w/out any extra PCI cards in them, using the latest Fedora Core
1 kernel.  We also still continue to see it on the X5DPE-G2 boards,
with the latest RHL9 and FC1 kernels.  This has become a very urgent
issue.

Comment 6 Dave Jones 2004-04-29 13:00:16 UTC
latest RHL9 errata definitly has this fix. It didn't get into the FC1
tree though.

So, if you're running the latest RHL9 tree, then you're hitting a
different issue.


Comment 7 Jesse Keating 2004-04-29 15:17:00 UTC
Can you tell me which patch involved the fix for this?  It's not
immediately obvious when looking at the changelog and the patch list.
 Thanks!

Comment 8 Jesse Keating 2004-04-29 16:23:23 UTC
n/m I found the patch: linux-2.4.21-wait-kbd-disable.patch

I seem to remember still having this lockup a week or so ago, on a
fully updated RHL9 system.  It triggered immediately after flashing
the bios on some 3ware cards in a X5DPE-G2.  I will be applying this
patch to the Fedora Core 1 kernel to see if I can duplicate it on a
system we have here that is able to duplicate.

Comment 9 Jesse Keating 2004-04-29 16:32:38 UTC
I just talked to Robin, our tech who found the bug, and he says that
he's tried using the _w version of the command, and it did not solve
the issue.  So this is not fixed.  Thanks!

Comment 10 Andy Anderson 2004-09-04 15:14:25 UTC
I am currently running RH 9 ver. 2.4.20-20.9 on a "home-brew" computer
system. This is an upgrade from the first RH 9 kernel that I installed.

Currently I have this boot issue every time the system reboots whether
by my hand or system reboots (power outages, etc). This hang continues
every boot...even when I use a second boot path of RH 7.1. It is
correctable only by leaving the system "off" for 20 minutes or more.
After this is comes up cleanly.

Will the current "fix" correct this issue?

Comment 11 Jesse Keating 2004-09-04 18:09:23 UTC
I think we have stumbled across a solution.  Try disabling 'USB Legacy
Support' in the bios.  We did this to resolve a ps/2 issue, however it
seems to have resolved this issue as well.  Please let us know if it
helps.

Comment 12 Bugzilla owner 2004-09-30 15:40:42 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/