Bug 123834

Summary: 440GX+ with aic7xxx crashes under SMP w/ 2 CPUs, fine with 1 CPU
Product: [Fedora] Fedora Reporter: Jeff Maurer <jmaurer>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 2CC: alan, pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-16 04:56:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Transcript of kernel boot
none
Boot log with acpi=force added to kernel options (truncated after error loop begins) none

Description Jeff Maurer 2004-05-20 21:35:40 UTC
Description of problem:

Kernel boots fine until it loads the aic7xxx module.  I've tried two
cards: aha-2940uw and aha-2950u2b; both exhibit same behavior. 
aic7xxx module detects the channel parameters then reports:
"scsi0:0:0:0: Attempting to queue an ABORT message"
and repeatedly dumps crash info.  Transcript is attached, truncated
after a few repetitions of the error.

Note that the machine boots properly under a) uniproc kernel with one
CPU installed, b) uniproc kernel with two CPUs installed, c) SMP
kernel with only one CPU installed.  It only crashes when two CPUs are
installed and the SMP kernel is running.

I have not tried the Fedora installer on this box since I lack the
console adapter for it; I installed the OS on an equivalent PIII
machine then moved the hard drives across.  I've read similar bug
reports of the installer crashing, but this report regards the normal
kernel, not the installer.  I have also not tried installing Fedora
Core 1 on this box.

The machine is a Micron Netframe 4400R, Intel 440GX chipset, up to two
slot-1 PIII CPUs.  Motherboard manufacturer is 'Network Engines'.


Version-Release number of selected component (if applicable):
kernel-2.6.5-1.358smp

How reproducible: always


Steps to Reproduce:
1. Boot SMP kernel with two CPUs installed
  
Actual results:
Kernel crashes

Expected results:
Kernel boots

Additional info:

Comment 1 Jeff Maurer 2004-05-20 21:37:08 UTC
Created attachment 100391 [details]
Transcript of kernel boot

Comment 2 Alan Cox 2004-05-21 23:04:33 UTC
440GX systems may need you to boot with "acpi=force". This is a
generic 2.6.x bug that should now have been fixed upstream by the
Intel guys and so will end up in an errata.

Does that fix the problem ?



Comment 3 Jeff Maurer 2004-05-21 23:51:53 UTC
Nope.  I added that line to the config and booted again.  Same problem
occurs.  I'm attaching the new boot log.

Comment 4 Jeff Maurer 2004-05-21 23:58:47 UTC
Created attachment 100437 [details]
Boot log with acpi=force added to kernel options (truncated after error loop begins)

Comment 5 Stephane ODUL 2004-08-10 23:14:56 UTC
I'm having the problem on my servers. The system seems to be working for a while (a few 
minutes) then the SCSI controler get an ABORT. The only fix I've found is to use a non smp 
kernel. It also seem to be "damaging" my SCSI drives as often the SCSI Bios won't even see 
them after a reboot, I have to get the system powered down for a while before it accept to 
see the drive again.


Any idea when we will be able to get a real fix ?

Comment 6 Alan Cox 2004-08-10 23:19:53 UTC
If it runs for a while you have a different unrelated problem. The
fact that the BIOS then doesnt see the drive suggests its cables or
drive overheat maybe ?


Comment 7 Stephane ODUL 2004-08-11 18:48:36 UTC
I've changed the MP (MultiProcessor) specifications in the BIOS from 1.4 to 1.1 and now 
the machine has been running fine with the smp kernel for 12 hours.

As you say, I've suspected the cables, or the drives at first, even the controler, but I've 
replaced the cables, the drives several times, even replaced the motherboard, with always 
the same exact resuts: mp 1.4 + kernel smp + scci gives the "Attempting to queue an 
ABORT message" after a while (whithin an hour).

The single CPU kernel never gives any problem, and now setting the BIOS to MP 1.1 seems 
to help.

Comment 8 Alan Cox 2004-08-11 21:18:38 UTC
Not what I'd have expected but glad its now happy


Comment 9 Dave Jones 2005-04-16 04:56:15 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.