Bug 123834 - 440GX+ with aic7xxx crashes under SMP w/ 2 CPUs, fine with 1 CPU
Summary: 440GX+ with aic7xxx crashes under SMP w/ 2 CPUs, fine with 1 CPU
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-05-20 21:35 UTC by Jeff Maurer
Modified: 2015-01-04 22:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-16 04:56:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Transcript of kernel boot (24.15 KB, text/plain)
2004-05-20 21:37 UTC, Jeff Maurer
no flags Details
Boot log with acpi=force added to kernel options (truncated after error loop begins) (13.06 KB, text/plain)
2004-05-21 23:58 UTC, Jeff Maurer
no flags Details

Description Jeff Maurer 2004-05-20 21:35:40 UTC
Description of problem:

Kernel boots fine until it loads the aic7xxx module.  I've tried two
cards: aha-2940uw and aha-2950u2b; both exhibit same behavior. 
aic7xxx module detects the channel parameters then reports:
"scsi0:0:0:0: Attempting to queue an ABORT message"
and repeatedly dumps crash info.  Transcript is attached, truncated
after a few repetitions of the error.

Note that the machine boots properly under a) uniproc kernel with one
CPU installed, b) uniproc kernel with two CPUs installed, c) SMP
kernel with only one CPU installed.  It only crashes when two CPUs are
installed and the SMP kernel is running.

I have not tried the Fedora installer on this box since I lack the
console adapter for it; I installed the OS on an equivalent PIII
machine then moved the hard drives across.  I've read similar bug
reports of the installer crashing, but this report regards the normal
kernel, not the installer.  I have also not tried installing Fedora
Core 1 on this box.

The machine is a Micron Netframe 4400R, Intel 440GX chipset, up to two
slot-1 PIII CPUs.  Motherboard manufacturer is 'Network Engines'.


Version-Release number of selected component (if applicable):
kernel-2.6.5-1.358smp

How reproducible: always


Steps to Reproduce:
1. Boot SMP kernel with two CPUs installed
  
Actual results:
Kernel crashes

Expected results:
Kernel boots

Additional info:

Comment 1 Jeff Maurer 2004-05-20 21:37:08 UTC
Created attachment 100391 [details]
Transcript of kernel boot

Comment 2 Alan Cox 2004-05-21 23:04:33 UTC
440GX systems may need you to boot with "acpi=force". This is a
generic 2.6.x bug that should now have been fixed upstream by the
Intel guys and so will end up in an errata.

Does that fix the problem ?



Comment 3 Jeff Maurer 2004-05-21 23:51:53 UTC
Nope.  I added that line to the config and booted again.  Same problem
occurs.  I'm attaching the new boot log.

Comment 4 Jeff Maurer 2004-05-21 23:58:47 UTC
Created attachment 100437 [details]
Boot log with acpi=force added to kernel options (truncated after error loop begins)

Comment 5 Stephane ODUL 2004-08-10 23:14:56 UTC
I'm having the problem on my servers. The system seems to be working for a while (a few 
minutes) then the SCSI controler get an ABORT. The only fix I've found is to use a non smp 
kernel. It also seem to be "damaging" my SCSI drives as often the SCSI Bios won't even see 
them after a reboot, I have to get the system powered down for a while before it accept to 
see the drive again.


Any idea when we will be able to get a real fix ?

Comment 6 Alan Cox 2004-08-10 23:19:53 UTC
If it runs for a while you have a different unrelated problem. The
fact that the BIOS then doesnt see the drive suggests its cables or
drive overheat maybe ?


Comment 7 Stephane ODUL 2004-08-11 18:48:36 UTC
I've changed the MP (MultiProcessor) specifications in the BIOS from 1.4 to 1.1 and now 
the machine has been running fine with the smp kernel for 12 hours.

As you say, I've suspected the cables, or the drives at first, even the controler, but I've 
replaced the cables, the drives several times, even replaced the motherboard, with always 
the same exact resuts: mp 1.4 + kernel smp + scci gives the "Attempting to queue an 
ABORT message" after a while (whithin an hour).

The single CPU kernel never gives any problem, and now setting the BIOS to MP 1.1 seems 
to help.

Comment 8 Alan Cox 2004-08-11 21:18:38 UTC
Not what I'd have expected but glad its now happy


Comment 9 Dave Jones 2005-04-16 04:56:15 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.



Note You need to log in before you can comment on or make changes to this bug.