The HP Vectra XU 5/90 (Pentium SMP, Neptune chipset) will hang with the
default RH 6.2 installation and most PCI cards in either of it's two PCI
slots when the card is accessed at the same time as the onboard SCSI
controller (interrupt conflict problem).
I have had a chance to do a lot of investigation into this problem and
have discovered what causes the problem and I have a proposal for how to
fix it. Unfortunately, I don't have enough experience modifying the kernel
code to fully implement my proposal.
The problem -- simply put, the machine has an empty MP IRQ table. So the
kernel runs "construct_default_ISA_mptable()". The computer is a PCI/ISA
system, so mpc_default_type==5 and the default table created assigns ISA-
style interrupts to everything. This means that the PCI interrupts get
assigned type EDGE.
That is what causes the machine to hang. In my case I have a Netgear
FA310TX ethernet card in one PCI slot. Using the ethernet card during disk
access will cause the onboard SCSI controller to hang. Due to the design
of the MB, both PCI slots and the onboard SCSI controller share an
interrupt so there is no way to reassign interrupts or move the card to
avoid the conflict.
The solution -- the machine, even though it is a PCI/ISA combo, supports
the EISA ELCR (EISA Level/Edge Control Register). As a quick hack, making
the following change to io_apic.c:
< mpc_default_type == 6) ? MP_BUS_EISA :
> mpc_default_type == 5 || mpc_default_type ==
MP_BUS_EISA : MP_BUS_ISA;
will read the ELCR and assign the correct interrupt trigger types (EDGE
for everything but the PCI interrupts). In fact, looking back at old
kernel source, it appears the kernel used to do this (it didn't have the
comparison and used EISA for everything) until around 2.2.5 or 2.2.6 (I
don't remember exactly).
The proposal -- I propose changing construct_default_ISA_mptable() to do
the following. If mpc_default_type==5 (an ISA/PCI board), then it *should*
have a MP IRQ table. The fact that it doesn't means it's slightly broken
(or really old). If mpc_default_type==5, then there is at least the
possibility of PCI devices in the system, so could the kernel check and
see if a valid ELCR exists and use it if it does?
I don't know what would be a good sanity check for the ELCR. Maybe check a
known IRQ (like 2) and see that it makes sense (EDGE trigger) and then go
from there? Or make sure that the values aren't all 0x00 or 0xff or
something obviously incorrect?
If the ELCR passes the sanity test then use it to assign the interrupt
triggers for the MB. The impact should be minimal since any
mpc_default_type other than 5 would not be affected, and if it really is
ISA/PCI, how else can it communicate the interrupt triggers?
By using the ELCR on my XU 5/90, it can boot the 2.2.17SMP kernel and has
been running for two weeks now without any problems and under
(occassionally) heavy disk and network load.
Thank you for your patience.
Initially I though 'uggh cant be sure this workaround is safe'. However a
question. DOes the
MP table for that board contain valid vendor information (dmesg on boot will
show you it)
if so I can match the strings and kick your fix in only on the HP/XU
I looked through dmesg and there don't appear to be any messages about vendor
information from the MP table. However, since I've never seen one I'm not 100%
sure what I'm looking for. I would be happy to attach/e-mail my dmesg file on
On the subject of breaking other boards; if the board type is reported as
ISA/PCI and it has no MP table IRQ information then it will fail if a PCI
interrupt is shared because the current construct_default_ISA_mptable() assigns
all interrupts as EDGE. Or maybe I'm just unlucky enough that the tulip and
tmscsim modules have some bug that won't let them share an EDGE interrupt. On
any other machine that has separate interrupts for each PCI slot I probably
wouldn't even have noticed.
Looking through past postings (around the time of 2.2.4 or 2.2.5) there was
mention of a couple early Neptune SMP boards that were ISA/PCI and didn't have
MP table IRQ information but did have ELCR information. Apparently, when these
boards were made, the MP specs were unclear enough that HP (among others) put
their IRQ information in the ELCR, even though these are ISA/PCI machines.
There was some discussion about how 2.2.6 broke them when it added the series
of "mpc_default_type ==" checks. But it doesn't seem that anything ever came of
this thread back in 5/99.
I admit my hack is a bit cheap, but I couldn't think of a good way to validate
the ELCR information. As far as I know, reading from a non-existent port (if
the machine doesn't have a valid ELCR either) could return anything. If there
were an (easy) way to validate the ELCR information, it seems to make sense to
try using it before falling back on assigning everything as EDGE. That doesn't
seem right because we've already been told (mpc_default_type=5) that the
machine has PCI, so we know that at least some of the interrupts should be
Thanks for the help!
If you can attach that and an lspci -v that would be great
Created attachment 3440 [details]
Copy of dmesg from most recent boot
Created attachment 3441 [details]
Output of 'lspci -v', everything but device at 00:0e.0 is onboard
This is believed fixed in the current 2.4 kernel trees. We read and check for a
valid looking ELCR even if we have no indication one is present.