17493 – Fix to allow HP Vectra XU 5/90 to boot (SMP)

Bug 17493 - Fix to allow HP Vectra XU 5/90 to boot (SMP)

Summary: Fix to allow HP Vectra XU 5/90 to boot (SMP)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Alan Cox
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-09-14 07:22 UTC by John William
Modified:	2008-05-01 15:37 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-15 01:47:42 UTC
Embargoed:

Attachments	(Terms of Use)
Copy of dmesg from most recent boot (5.32 KB, text/plain) 2000-09-17 18:26 UTC, John William	no flags	Details
Output of 'lspci -v', everything but device at 00:0e.0 is onboard (1.11 KB, text/plain) 2000-09-17 18:27 UTC, John William	no flags	Details
View All

Description John William 2000-09-14 07:22:48 UTC

The HP Vectra XU 5/90 (Pentium SMP, Neptune chipset) will hang with the 
default RH 6.2 installation and most PCI cards in either of it's two PCI 
slots when the card is accessed at the same time as the onboard SCSI 
controller (interrupt conflict problem).

I have had a chance to do a lot of investigation into this problem and 
have discovered what causes the problem and I have a proposal for how to 
fix it. Unfortunately, I don't have enough experience modifying the kernel 
code to fully implement my proposal.

The problem -- simply put, the machine has an empty MP IRQ table. So the 
kernel runs "construct_default_ISA_mptable()". The computer is a PCI/ISA 
system, so mpc_default_type==5 and the default table created assigns ISA-
style interrupts to everything. This means that the PCI interrupts get 
assigned type EDGE.

That is what causes the machine to hang. In my case I have a Netgear 
FA310TX ethernet card in one PCI slot. Using the ethernet card during disk 
access will cause the onboard SCSI controller to hang. Due to the design 
of the MB, both PCI slots and the onboard SCSI controller share an 
interrupt so there is no way to reassign interrupts or move the card to 
avoid the conflict.

The solution -- the machine, even though it is a PCI/ISA combo, supports 
the EISA ELCR (EISA Level/Edge Control Register). As a quick hack, making 
the following change to io_apic.c:

938c938
<                             mpc_default_type == 6) ? MP_BUS_EISA : 
MP_BUS_ISA;
---
>                             mpc_default_type == 5 || mpc_default_type == 
6) ?
MP_BUS_EISA : MP_BUS_ISA;

will read the ELCR and assign the correct interrupt trigger types (EDGE 
for everything but the PCI interrupts). In fact, looking back at old 
kernel source, it appears the kernel used to do this (it didn't have the 
comparison and used EISA for everything) until around 2.2.5 or 2.2.6 (I 
don't remember exactly).

The proposal -- I propose changing construct_default_ISA_mptable() to do 
the following. If mpc_default_type==5 (an ISA/PCI board), then it *should* 
have a MP IRQ table. The fact that it doesn't means it's slightly broken 
(or really old). If mpc_default_type==5, then there is at least the 
possibility of PCI devices in the system, so could the kernel check and 
see if a valid ELCR exists and use it if it does?

I don't know what would be a good sanity check for the ELCR. Maybe check a 
known IRQ (like 2) and see that it makes sense (EDGE trigger) and then go 
from there? Or make sure that the values aren't all 0x00 or 0xff or 
something obviously incorrect?

If the ELCR passes the sanity test then use it to assign the interrupt 
triggers for the MB. The impact should be minimal since any 
mpc_default_type other than 5 would not be affected, and if it really is 
ISA/PCI, how else can it communicate the interrupt triggers?

By using the ELCR on my XU 5/90, it can boot the 2.2.17SMP kernel and has 
been running for two weeks now without any problems and under 
(occassionally) heavy disk and network load.

Thank you for your patience.

Comment 1 Alan Cox 2000-09-16 22:24:59 UTC

Initially I though 'uggh cant be sure this workaround is safe'. However a
question. DOes the
MP table for that board contain valid vendor information (dmesg on boot will
show you it)
if so I can match the strings and kick your fix in only on the HP/XU

Comment 2 John William 2000-09-17 06:40:53 UTC

I looked through dmesg and there don't appear to be any messages about vendor 
information from the MP table. However, since I've never seen one I'm not 100% 
sure what I'm looking for. I would be happy to attach/e-mail my dmesg file on 
request.

On the subject of breaking other boards; if the board type is reported as 
ISA/PCI and it has no MP table IRQ information then it will fail if a PCI 
interrupt is shared because the current construct_default_ISA_mptable() assigns 
all interrupts as EDGE. Or maybe I'm just unlucky enough that the tulip and 
tmscsim modules have some bug that won't let them share an EDGE interrupt. On 
any other machine that has separate interrupts for each PCI slot I probably 
wouldn't even have noticed.

Looking through past postings (around the time of 2.2.4 or 2.2.5) there was 
mention of a couple early Neptune SMP boards that were ISA/PCI and didn't have 
MP table IRQ information but did have ELCR information. Apparently, when these 
boards were made, the MP specs were unclear enough that HP (among others) put 
their IRQ information in the ELCR, even though these are ISA/PCI machines. 
There was some discussion about how 2.2.6 broke them when it added the series 
of "mpc_default_type ==" checks. But it doesn't seem that anything ever came of 
this thread back in 5/99.

I admit my hack is a bit cheap, but I couldn't think of a good way to validate 
the ELCR information. As far as I know, reading from a non-existent port (if 
the machine doesn't have a valid ELCR either) could return anything. If there 
were an (easy) way to validate the ELCR information, it seems to make sense to 
try using it before falling back on assigning everything as EDGE. That doesn't 
seem right because we've already been told (mpc_default_type=5) that the 
machine has PCI, so we know that at least some of the interrupts should be 
LEVEL.

Thanks for the help!

Comment 3 Alan Cox 2000-09-17 10:59:46 UTC

If you can attach that and an lspci -v that would be great

Comment 4 John William 2000-09-17 18:26:31 UTC

Created attachment 3440 [details]
Copy of dmesg from most recent boot

Comment 5 John William 2000-09-17 18:27:54 UTC

Created attachment 3441 [details]
Output of 'lspci -v', everything but device  at 00:0e.0 is onboard

Comment 6 Alan Cox 2001-05-05 13:28:09 UTC

This is believed fixed in the current 2.4 kernel trees. We read and check for a
valid looking ELCR even if we have no indication one is present.

Note You need to log in before you can comment on or make changes to this bug.