I'm using the qa0309 tree. Hardware: Compaq ProLiant 1600R with Compaq SMART 3200 RAID Array Controller and Compaq Remote Insight Board PCI Booting qa0309 from CDROM, installer tries to detect hardware and thinks it finds a Megaraid controller (but there is none). After kernel tries to start the megaraid driver machine hangs (virtual console switching still works). Please find attached two files: - a hand-copy of virtual console 3 - lspci -v -v Machine currently runs RH 6.1 with kernel 2.2.19pre6. Let me know if you need more info. Seems like a mixed installer and kernel problem (installer misdetecting non-existant megaraid controller and kernel locking on insmod megaraid.o).
Created attachment 12866 [details] virtual console 3 (F3) hand-copied screen content
Created attachment 12867 [details] lspci -v -v output
BTW, I'm using TUI install mode (not that I think that this matters...)
Bill could you please make sure we don't have a bad kudzu pcitable entry?
It's the i960; it's apparently *occasionally* a megaraid, for example, on a Dell PE4300. Any ideas?
Matt any ideas?
<crude_hack> Would taking the presence of other Compaq equipment be a reasonable indication that a detected i960 is a SMART Controller and not a megaraid? </crude_hack>
we need to check to see if the 8086:1960 devices has a subid that we can key for megaraid
OK, that compaq i960 thing will not be mapped to megaraid as of kudzu-0.98.5-1. Assigning to kernel; the fact that loading megaraid hangs the machine is a kernel problem.
The megaraid driver makes the same assumption as kudzu does. Please verify against latest kernels as we put in a improved megaraid driver.
Whom do you mean with "Please verify" Arjan? Red Hat QA labs or me?
Well, if you can try a recent Rawhide kernel, that would be very appriciated.
Same problem with qa0319 which has the same kernel as current rawhide.
In megaraid_detect(): count += mega_findCard (pHostTmpl, 0x8086, PCI_DEVICE_ID_AMI_MEGARAID3, BOARD_QUARTZ); That's where we pick up all cards with an i960, at 0x8086:0x1960. In mega_findCard(): while ((pdev = pci_find_device (pciVendor, pciDev, pdev))) { if (pci_enable_device (pdev)) continue; We've enabled the device before we know it's our card, which doesn't seem right. pciBus = pdev->bus->number; pciDevFun = pdev->devfn; #endif if ((flag & BOARD_QUARTZ) && (skip_id == -1)) { pcibios_read_config_word (pciBus, pciDevFun, PCI_CONF_AMISIG, &magic); if ((magic != AMI_SIGNATURE) && (magic != AMI_SIGNATURE_471)) { pciIdx++; continue; /* not an AMI board */ } Now we've read a config space word 0xa0 looking for a signature, but we don't know that it's our card yet. Reading that word probably hangs the Compaq card for some reason. Here's where having the complete table of PCI IDs would be preferable. But, I'm hesitant to suggest changes to this detect algorithm so close to gold... So, we can't have both an AMI megaraid and a Compaq SMART 3200 in the same system. I don't know how Compaq would feel about this.
Adding Alan for his insights.
BTW: there is no problem with a machine having an ICP Vortex controller which uses i960 chip. 00:0c.0 SCSI storage controller: ICP Raid Controller GDT 6117RP/6517RP (rev 05) Subsystem: ICP Raid Controller GDT 6117RP/6517RP Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, cache line size 08 Interrupt: pin A routed to IRQ 10 Region 0: Memory at 000c8000 (low-1M, prefetchable) [size=16K] Expansion ROM at <unassigned> [disabled] [size=32K] I guess this is because the i960 chip is not "visible" as a PCI device (I'm not into PCI stuff, so please forgive my diffuse guesswork ;>).
Matt: from what I'm seeing now, there is a clear no-go for Compaq SMARTs (at least 3200) alone in Compaq systems. I can imagine Compaq (and me for our Compaq servers) worries more about that than about having Compaq-SMART and Megaraids together in one machine. :-> Sorry, I have only one compaq machine for testing, which has a SMART 3200 controller so I can't give more datapoints for other SMART controllers. Is Compaq on Bugzilla? Perhaps we should Cc: them too.
There are about ten devices reporting themselves as Intel i960. The older megaraid driver had a broken check for a magic subid. The new one has the check working (I default skip_id to -1) so should have fixed this. The megaraid driver could still arbitarily lock up machines with an i960 since the magic check is unsafe both because it might not be a unique value and also because it reads reserved space that isnt guaranteed not to have a side effect. It does however seem to work with skip_id defaulting to -1. If you have a failing case let me know We should switching to subids in the driver too but this is not for this release. The subvendor ID for my megaraid is (Subsystem: Dell Computer Corporation: Unknown device 1111). So kudzu could simply check the subsystem vendor id I guess
This compaq box is apparently a failing case. Currently kudzu is set to ignore this particular card, so the only way it would fail is if one of these *and* a real megaraid are in the same box.
Then we need to switch to using the subvendor ID. This has to happen at some point anyway. Can PeterJ and/or the Dell folks provide a complete list of subvendor id data for the board and I'll fix up the driver (the driver bit is easy to do)
Here is a table of Dell RAID cards that use megaraid driver. VID DID SVID SID NAME 0x101e 0x9010 0x0000 0x0000 PERC 0x8086 0x1960 0x1028 0x1111 PERC2/SC 0x8086 0x1960 0x1028 0x0467 PERC2/DC 0x101e 0x1960 0x1028 0x0493 PERC3/DC 0x101e 0x1960 0x1028 0x0471 PERC3/QC
I asked PeterJ about this yesterday. More details to follow. Date: Wed, 21 Mar 2001 22:21:58 -0500 From: Peter Jarrett <Peterj> To: Doug Gurney <dougg>, Matt_Domsch.com, peterj, notting, Tesfamariam_Michael.com Cc: Brian Highers <BrianH>, Atul Mukker. <Atulm> Subject: RE: Megaraid PCI ids patch I will acquire a list for the HP and AMI boards - Doug can you supply the updated list! At a minimum, 0x101e 0x1960 0x101e 0x0??? are all AMI generic 0x8086 0x1960 0x101e 0x0??? are all AMI generic 0x101e 0x1960 0x103c 0x0??? are HP 0x8086 0x1960 0x103c 0x0??? are HP
Problem persistent in qa0322.
Hm, it *should* be fixed there. Does the pcitable in the installer image have an entry for the i960 you have?
images/boot.img -> initrd.img -> modules/pcitable contains: 0x8086 0x1960 "megaraid" "Intel Corporation|80960RP [i960RP Microprocessor]" 0x8086 0x1960 0x1028 0x1111 "megaraid" "Dell|PowerEdge RAID Controller 2/SC" 0x8086 0x1960 0x1028 0x0467 "megaraid" "Dell|PowerEdge RAID Controller 2/DC" My card: 00:10.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 05) 00:10.1 Memory: Intel Corporation 80960RP [i960RP Microprocessor] (rev 05) Subsystem: Unknown device 0e11:c000 So my interpretation (not understanding this PCI stuff) is, that something like this is missing in the PCI table: 0x8086 0x1960 0x0e11 0xc000 "cpqarray" "[whatever]" right? This may or may not fix possible collisions with other Compaq SMART controllers.
Actually, that should be: 0x8086 0x1960 0x0e11 0xc000 "unknown" "[whatever]" If you look at /usr/share/kudzu/pcitable in the included-in-qa0322 kudzu package, that line is there, correct? Matt/Dr. Mike; are the unknown lines getting pulled into the first stage of the installer?
Actually, I just have made a modified boot disk with pcitable entry: 0x8086 0x1960 0x0e11 0xc000 "cpqarray" "Compaq|SMART 3200 RAID Array Controller" Now, "Found suggestion of megaraid" is changed to "Found suggestion of cpqarray" and the installer proceeds, although something still insmods megaraid.o, but this doesn't hang the machine anymore. As soon as the install is completely done (I'm now burning disc 2 ISO) I get back regarding the kudzu RPM pcitable.
BTW: is a standard Server class installation with no sub-functionalities selected supposed to need disc 2?
OK. Installing finished successfully ("Congratulations..."), but after "rebooting system..." the box hang (console switching works, but nothing more). Had to powercycle. After first booting into the system the fun continues: [root@alexis /root]# fgrep hostname /var/log/boot.log Mar 23 04:14:36 alexis rc.sysinit: Setting hostname localhost.localdomain: succeeded [root@alexis /root]# fgrep HOSTNAME /etc/sysconfig/network HOSTNAME=localhost.localdomain Although I entered a complete FQDN, static IP config and the box _should_ have had working connectivity to it's nameserver (TLAN driver got loaded, IP config still works) while installing. Especially interesting is that the shell prompt contains the right host part of the FQDN. Weird. I guess after the first normal reboot it reverts to "localhost" because of /etc/sysconfig/network. Next fun: When using ping, I get the following: Warning: time of day goes back, taking countermeasures. I can't see the system clock going backward... No, NTP is not running - not even installed. Regarding /usr/share/kudzu/pcitable: # fgrep 0xc000 /usr/share/kudzu/pcitable 0x8086 0x1960 0x0e11 0xc000 "unknown" "Intel Corporation|80960RP [i960RP Microprocessor]" and no, there are no "unknown" entries in the first-stage installer pcitable.
OK, so it's an installer bug that it's not getting pulled into the pcitable. The other stuff is completely unrelated to the megaraid issues; please open other bugs for those if you feel they are problems.
Here are additional PCI IDs which the PERC2/SC card could report as. Each of these only occur on cards with old buggy firmware which the driver detects and requires upgrading before using the card, but they still need to be detected. -Matt From: Doug Gurney [dougg] Sent: Thursday, March 22, 2001 7:05 PM To: Peter Jarrett; 'Tesfamariam_Michael' Cc: 'Matt_Domsch' Subject: RE: AMI Card The information looks correct. There were some additional subsystem ids for the PERC2/SC, as follows, but you have it correct for the latest firmware released. Dell 466 (prior to v2.10) 101E 09A0 Dell 466 (v2.10 to v3.00) 1111 1111 Dell 466 (v3.01 and later) 1028 1111
But these will still get picked up as long as the generic i960 is mapped to megaraid, yes?
There is a problem with the "unknown" mapping: cpqarray driver doesn't get loaded so installing from harddisk is not possible (all disks are connected to the SMART controller), because the cpqarray driver (and megaraid) gets loaded only right _after_ install source selection. Mapping the PCI device to cpqarray fixes that.
This happens only if I boot from the floppy without having the CDROM in the drive (so CDROM mount fails). If CDROM mount succeeds, the cpqarray driver gets loaded.
That doesn't make sense, since you already have something mapped to cpqarray in the machine. I'd say the reason it wouldn't load until after install source selection is because the module isn't on the boot disk.
Of course. Sigh. Need coffee. 5:23am local time.
> But these will still get picked up as long as the generic > i960 is mapped to megaraid, yes? Yes. The vendor/device IDs are i960. Alan thought it proper to have the driver search by vendor/device/subvendor/subdevice, so those are some extra subsystem numbers to search for under the i960 generic vendor/device. So, we're solving two related problems. 1) kudzu shouldn't try loading the megaraid driver on the Compaq controller anymore. This is now fixed in your kudzu tree. 2) megaraid shouldn't hang the system while reading config space bytes on the Compaq controller. This still needs to be fixed, and requires the PCI ID table we've slowly built up in this bugzilla entry.
Problem still persistent with qa0327. Same tty3 output like the attachment on this bug.
Matt/Dr. Mike - are the unknown entries getting stripped from the pcitable used by the installer?
Still no "unknown" PCI entries in modules/pcitable on qa0401 boot.img initrd.
fixing.
fixed trimpcitable to not nuke unknown entries.
OK, is the just released qa0404 fixed then?
nope, not fixed until qa0405.
OK, 7.1 gold installs perfectly on my test machine. Fix confirmed :-)
great, thanks.