Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Fix to allow HP Vectra XU 5/90 to boot (SMP)|
|Product:||[Retired] Red Hat Linux||Reporter:||John William <jw2357>|
|Component:||kernel||Assignee:||Alan Cox <alan>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2002-12-14 20:47:42 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description John William 2000-09-14 03:22:48 EDT
The HP Vectra XU 5/90 (Pentium SMP, Neptune chipset) will hang with the default RH 6.2 installation and most PCI cards in either of it's two PCI slots when the card is accessed at the same time as the onboard SCSI controller (interrupt conflict problem). I have had a chance to do a lot of investigation into this problem and have discovered what causes the problem and I have a proposal for how to fix it. Unfortunately, I don't have enough experience modifying the kernel code to fully implement my proposal. The problem -- simply put, the machine has an empty MP IRQ table. So the kernel runs "construct_default_ISA_mptable()". The computer is a PCI/ISA system, so mpc_default_type==5 and the default table created assigns ISA- style interrupts to everything. This means that the PCI interrupts get assigned type EDGE. That is what causes the machine to hang. In my case I have a Netgear FA310TX ethernet card in one PCI slot. Using the ethernet card during disk access will cause the onboard SCSI controller to hang. Due to the design of the MB, both PCI slots and the onboard SCSI controller share an interrupt so there is no way to reassign interrupts or move the card to avoid the conflict. The solution -- the machine, even though it is a PCI/ISA combo, supports the EISA ELCR (EISA Level/Edge Control Register). As a quick hack, making the following change to io_apic.c: 938c938 < mpc_default_type == 6) ? MP_BUS_EISA : MP_BUS_ISA; --- > mpc_default_type == 5 || mpc_default_type == 6) ? MP_BUS_EISA : MP_BUS_ISA; will read the ELCR and assign the correct interrupt trigger types (EDGE for everything but the PCI interrupts). In fact, looking back at old kernel source, it appears the kernel used to do this (it didn't have the comparison and used EISA for everything) until around 2.2.5 or 2.2.6 (I don't remember exactly). The proposal -- I propose changing construct_default_ISA_mptable() to do the following. If mpc_default_type==5 (an ISA/PCI board), then it *should* have a MP IRQ table. The fact that it doesn't means it's slightly broken (or really old). If mpc_default_type==5, then there is at least the possibility of PCI devices in the system, so could the kernel check and see if a valid ELCR exists and use it if it does? I don't know what would be a good sanity check for the ELCR. Maybe check a known IRQ (like 2) and see that it makes sense (EDGE trigger) and then go from there? Or make sure that the values aren't all 0x00 or 0xff or something obviously incorrect? If the ELCR passes the sanity test then use it to assign the interrupt triggers for the MB. The impact should be minimal since any mpc_default_type other than 5 would not be affected, and if it really is ISA/PCI, how else can it communicate the interrupt triggers? By using the ELCR on my XU 5/90, it can boot the 2.2.17SMP kernel and has been running for two weeks now without any problems and under (occassionally) heavy disk and network load. Thank you for your patience.
Comment 1 Alan Cox 2000-09-16 18:24:59 EDT
Initially I though 'uggh cant be sure this workaround is safe'. However a question. DOes the MP table for that board contain valid vendor information (dmesg on boot will show you it) if so I can match the strings and kick your fix in only on the HP/XU
Comment 2 John William 2000-09-17 02:40:53 EDT
I looked through dmesg and there don't appear to be any messages about vendor information from the MP table. However, since I've never seen one I'm not 100% sure what I'm looking for. I would be happy to attach/e-mail my dmesg file on request. On the subject of breaking other boards; if the board type is reported as ISA/PCI and it has no MP table IRQ information then it will fail if a PCI interrupt is shared because the current construct_default_ISA_mptable() assigns all interrupts as EDGE. Or maybe I'm just unlucky enough that the tulip and tmscsim modules have some bug that won't let them share an EDGE interrupt. On any other machine that has separate interrupts for each PCI slot I probably wouldn't even have noticed. Looking through past postings (around the time of 2.2.4 or 2.2.5) there was mention of a couple early Neptune SMP boards that were ISA/PCI and didn't have MP table IRQ information but did have ELCR information. Apparently, when these boards were made, the MP specs were unclear enough that HP (among others) put their IRQ information in the ELCR, even though these are ISA/PCI machines. There was some discussion about how 2.2.6 broke them when it added the series of "mpc_default_type ==" checks. But it doesn't seem that anything ever came of this thread back in 5/99. I admit my hack is a bit cheap, but I couldn't think of a good way to validate the ELCR information. As far as I know, reading from a non-existent port (if the machine doesn't have a valid ELCR either) could return anything. If there were an (easy) way to validate the ELCR information, it seems to make sense to try using it before falling back on assigning everything as EDGE. That doesn't seem right because we've already been told (mpc_default_type=5) that the machine has PCI, so we know that at least some of the interrupts should be LEVEL. Thanks for the help!
Comment 3 Alan Cox 2000-09-17 06:59:46 EDT
If you can attach that and an lspci -v that would be great
Comment 4 John William 2000-09-17 14:26:31 EDT
Created attachment 3440 [details] Copy of dmesg from most recent boot
Comment 5 John William 2000-09-17 14:27:54 EDT
Created attachment 3441 [details] Output of 'lspci -v', everything but device at 00:0e.0 is onboard
Comment 6 Alan Cox 2001-05-05 09:28:09 EDT
This is believed fixed in the current 2.4 kernel trees. We read and check for a valid looking ELCR even if we have no indication one is present.