Red Hat Bugzilla – Bug 123834
440GX+ with aic7xxx crashes under SMP w/ 2 CPUs, fine with 1 CPU
Last modified: 2015-01-04 17:06:09 EST
Description of problem:
Kernel boots fine until it loads the aic7xxx module. I've tried two
cards: aha-2940uw and aha-2950u2b; both exhibit same behavior.
aic7xxx module detects the channel parameters then reports:
"scsi0:0:0:0: Attempting to queue an ABORT message"
and repeatedly dumps crash info. Transcript is attached, truncated
after a few repetitions of the error.
Note that the machine boots properly under a) uniproc kernel with one
CPU installed, b) uniproc kernel with two CPUs installed, c) SMP
kernel with only one CPU installed. It only crashes when two CPUs are
installed and the SMP kernel is running.
I have not tried the Fedora installer on this box since I lack the
console adapter for it; I installed the OS on an equivalent PIII
machine then moved the hard drives across. I've read similar bug
reports of the installer crashing, but this report regards the normal
kernel, not the installer. I have also not tried installing Fedora
Core 1 on this box.
The machine is a Micron Netframe 4400R, Intel 440GX chipset, up to two
slot-1 PIII CPUs. Motherboard manufacturer is 'Network Engines'.
Version-Release number of selected component (if applicable):
How reproducible: always
Steps to Reproduce:
1. Boot SMP kernel with two CPUs installed
Created attachment 100391 [details]
Transcript of kernel boot
440GX systems may need you to boot with "acpi=force". This is a
generic 2.6.x bug that should now have been fixed upstream by the
Intel guys and so will end up in an errata.
Does that fix the problem ?
Nope. I added that line to the config and booted again. Same problem
occurs. I'm attaching the new boot log.
Created attachment 100437 [details]
Boot log with acpi=force added to kernel options (truncated after error loop begins)
I'm having the problem on my servers. The system seems to be working for a while (a few
minutes) then the SCSI controler get an ABORT. The only fix I've found is to use a non smp
kernel. It also seem to be "damaging" my SCSI drives as often the SCSI Bios won't even see
them after a reboot, I have to get the system powered down for a while before it accept to
see the drive again.
Any idea when we will be able to get a real fix ?
If it runs for a while you have a different unrelated problem. The
fact that the BIOS then doesnt see the drive suggests its cables or
drive overheat maybe ?
I've changed the MP (MultiProcessor) specifications in the BIOS from 1.4 to 1.1 and now
the machine has been running fine with the smp kernel for 12 hours.
As you say, I've suspected the cables, or the drives at first, even the controler, but I've
replaced the cables, the drives several times, even replaced the motherboard, with always
the same exact resuts: mp 1.4 + kernel smp + scci gives the "Attempting to queue an
ABORT message" after a while (whithin an hour).
The single CPU kernel never gives any problem, and now setting the BIOS to MP 1.1 seems
Not what I'd have expected but glad its now happy
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat. The Fedora legacy project will be producing further kernel
updates for security problems only.
If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.