Bug 32082 - installer build scripts strip the "unknown" entries out of pcitable, causing megaraid misdetect.
Summary: installer build scripts strip the "unknown" entries out of pcitable, causing ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: anaconda
Version: 7.1
Hardware: i386
OS: Linux
high
high
Target Milestone: ---
Assignee: Matt Wilson
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-03-17 13:52 UTC by Daniel Roesen
Modified: 2005-10-31 22:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-04-30 15:06:34 UTC
Embargoed:


Attachments (Terms of Use)
virtual console 3 (F3) hand-copied screen content (748 bytes, text/plain)
2001-03-17 13:53 UTC, Daniel Roesen
no flags Details
lspci -v -v output (6.98 KB, text/plain)
2001-03-17 13:54 UTC, Daniel Roesen
no flags Details

Description Daniel Roesen 2001-03-17 13:52:00 UTC
I'm using the qa0309 tree.

Hardware: Compaq ProLiant 1600R
          with Compaq SMART 3200 RAID Array Controller
          and Compaq Remote Insight Board PCI

Booting qa0309 from CDROM, installer tries to detect hardware and thinks it
finds a Megaraid controller (but there is none). After kernel tries to
start the megaraid driver machine hangs (virtual console switching still
works).

Please find attached two files:

- a hand-copy of virtual console 3
- lspci -v -v

Machine currently runs RH 6.1 with kernel 2.2.19pre6.

Let me know if you need more info. Seems like a mixed installer and kernel
problem (installer misdetecting non-existant megaraid controller and kernel
locking on insmod megaraid.o).

Comment 1 Daniel Roesen 2001-03-17 13:53:10 UTC
Created attachment 12866 [details]
virtual console 3 (F3) hand-copied screen content

Comment 2 Daniel Roesen 2001-03-17 13:54:02 UTC
Created attachment 12867 [details]
lspci -v -v output

Comment 3 Daniel Roesen 2001-03-17 13:57:26 UTC
BTW, I'm using TUI install mode (not that I think that this matters...)

Comment 4 Michael Fulbright 2001-03-20 05:04:56 UTC
Bill could you please make sure we don't have a bad kudzu pcitable entry?

Comment 5 Bill Nottingham 2001-03-20 05:29:19 UTC
It's the i960; it's apparently *occasionally* a megaraid, for example, on
a Dell PE4300. Any ideas?

Comment 6 Michael Fulbright 2001-03-20 16:39:31 UTC
Matt any ideas?

Comment 7 Daniel Roesen 2001-03-20 16:50:49 UTC
<crude_hack>
Would taking the presence of other Compaq equipment be a reasonable indication
that a detected i960 is a SMART Controller and not a megaraid?
</crude_hack>

Comment 8 Matt Wilson 2001-03-20 16:54:44 UTC
we need to check to see if the 8086:1960 devices has a subid that we can key for
megaraid


Comment 9 Bill Nottingham 2001-03-21 19:42:03 UTC
OK, that compaq i960 thing will not be mapped to megaraid as of kudzu-0.98.5-1.

Assigning to kernel; the fact that loading megaraid hangs the machine is a
kernel problem.

Comment 10 Arjan van de Ven 2001-03-21 19:46:17 UTC
The megaraid driver makes the same assumption as kudzu does.
Please verify against latest kernels as we put in a improved megaraid
driver.

Comment 11 Daniel Roesen 2001-03-22 14:17:18 UTC
Whom do you mean with "Please verify" Arjan? Red Hat QA labs or me?

Comment 12 Arjan van de Ven 2001-03-22 14:19:03 UTC
Well, if you can try a recent Rawhide kernel, that would be very appriciated.

Comment 13 Daniel Roesen 2001-03-22 14:35:24 UTC
Same problem with qa0319 which has the same kernel as current rawhide.

Comment 14 Matt Domsch 2001-03-22 15:39:47 UTC
In megaraid_detect():

        count += mega_findCard (pHostTmpl, 0x8086,
                                PCI_DEVICE_ID_AMI_MEGARAID3, BOARD_QUARTZ);

That's where we pick up all cards with an i960, at 0x8086:0x1960.

In mega_findCard():

        while ((pdev = pci_find_device (pciVendor, pciDev, pdev))) {
                if (pci_enable_device (pdev))
                        continue;

We've enabled the device before we know it's our card, which doesn't seem right.

                pciBus = pdev->bus->number;
                pciDevFun = pdev->devfn;
#endif
                if ((flag & BOARD_QUARTZ) && (skip_id == -1)) {
                        pcibios_read_config_word (pciBus, pciDevFun,
                                                  PCI_CONF_AMISIG, &magic);
                        if ((magic != AMI_SIGNATURE)
                            && (magic != AMI_SIGNATURE_471)) {
                                pciIdx++;
                                continue;       /* not an AMI board */
                        }
Now we've read a config space word 0xa0 looking for a signature, but we don't 
know that it's our card yet.  Reading that word probably hangs the Compaq card 
for some reason.  Here's where having the complete table of PCI IDs would be 
preferable.

But, I'm hesitant to suggest changes to this detect algorithm so close to 
gold...  So, we can't have both an AMI megaraid and a Compaq SMART 3200 in the 
same system.  I don't know how Compaq would feel about this.

Comment 15 Matt Domsch 2001-03-22 15:47:23 UTC
Adding Alan for his insights.

Comment 16 Daniel Roesen 2001-03-22 15:54:23 UTC
BTW: there is no problem with a machine having an ICP Vortex controller which
uses i960 chip.

00:0c.0 SCSI storage controller: ICP Raid Controller GDT 6117RP/6517RP (rev 05)
	Subsystem: ICP Raid Controller GDT 6117RP/6517RP
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
	Latency: 32, cache line size 08
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at 000c8000 (low-1M, prefetchable) [size=16K]
	Expansion ROM at <unassigned> [disabled] [size=32K]

I guess this is because the i960 chip is not "visible" as a PCI device (I'm not
into PCI stuff, so please forgive my diffuse guesswork ;>).


Comment 17 Daniel Roesen 2001-03-22 16:01:33 UTC
Matt: from what I'm seeing now, there is a clear no-go for Compaq SMARTs (at
least 3200) alone in Compaq systems. I can imagine Compaq (and me for our Compaq
servers) worries more about that than about having Compaq-SMART and Megaraids
together in one machine. :->

Sorry, I have only one compaq machine for testing, which has a SMART 3200
controller so I can't give more datapoints for other SMART controllers.

Is Compaq on Bugzilla? Perhaps we should Cc: them too.

Comment 18 Alan Cox 2001-03-22 16:34:18 UTC
There are about ten devices reporting themselves as Intel i960. The older
megaraid driver had a broken check for a magic subid. The new one has the check
working (I default skip_id to -1) so should have fixed this. The megaraid driver
could still arbitarily lock up machines with an i960 since the magic check is
unsafe both because it might not be a unique value and also because it reads
reserved space that isnt guaranteed not to have a side effect. It does however
seem to work with
skip_id defaulting to -1. If you have a failing case let me know

We should switching to subids in the driver too but this is not for this
release.

The subvendor ID for my megaraid is (Subsystem: Dell Computer Corporation:
Unknown device 1111). So kudzu could simply check the subsystem vendor id I
guess



Comment 19 Bill Nottingham 2001-03-22 16:38:27 UTC
This compaq box is apparently a failing case.
Currently kudzu is set to ignore this particular card, so the only way
it would fail is if one of these *and* a real megaraid are in the same
box.

Comment 20 Alan Cox 2001-03-22 16:54:41 UTC
Then we need to switch to using the subvendor ID. This has to happen at some
point anyway. Can PeterJ and/or the Dell folks provide a complete list of
subvendor id data for the board and I'll fix up the driver (the driver bit is
easy to do)


Comment 21 Tesfamariam Michael 2001-03-22 17:11:14 UTC
Here is a table of Dell RAID cards that use megaraid driver.
VID		DID		SVID		SID         NAME
0x101e		0x9010		0x0000		0x0000      PERC
0x8086		0x1960		0x1028		0x1111      PERC2/SC
0x8086		0x1960		0x1028		0x0467      PERC2/DC
0x101e		0x1960		0x1028		0x0493      PERC3/DC
0x101e		0x1960		0x1028		0x0471      PERC3/QC



Comment 22 Matt Domsch 2001-03-22 17:20:24 UTC
I asked PeterJ about this yesterday.  More details to follow.

Date: Wed, 21 Mar 2001 22:21:58 -0500
From: Peter Jarrett <Peterj>
To: Doug Gurney <dougg>, Matt_Domsch.com, peterj,
     notting, Tesfamariam_Michael.com
Cc: Brian Highers <BrianH>, Atul Mukker. <Atulm>
Subject: RE: Megaraid PCI ids patch

I will acquire a list for the HP and AMI boards - Doug can you supply the
updated list!

At a minimum, 

0x101e 0x1960 0x101e   0x0??? are all AMI generic
0x8086 0x1960 0x101e   0x0??? are all AMI generic

0x101e 0x1960 0x103c   0x0??? are HP
0x8086 0x1960 0x103c   0x0??? are HP



Comment 23 Daniel Roesen 2001-03-23 00:38:03 UTC
Problem persistent in qa0322.

Comment 24 Bill Nottingham 2001-03-23 01:28:31 UTC
Hm, it *should* be fixed there. Does the pcitable in the installer image
have an entry for the i960 you have?

Comment 25 Daniel Roesen 2001-03-23 02:11:21 UTC
images/boot.img -> initrd.img -> modules/pcitable contains:

0x8086  0x1960  "megaraid"      "Intel Corporation|80960RP [i960RP
Microprocessor]"
0x8086  0x1960  0x1028  0x1111  "megaraid"      "Dell|PowerEdge RAID Controller
2/SC"
0x8086  0x1960  0x1028  0x0467  "megaraid"      "Dell|PowerEdge RAID Controller
2/DC"

My card:

00:10.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge]
(rev 05)
00:10.1 Memory: Intel Corporation 80960RP [i960RP Microprocessor] (rev 05)
        Subsystem: Unknown device 0e11:c000

So my interpretation (not understanding this PCI stuff) is, that something like
this is missing in the PCI table:

0x8086  0x1960  0x0e11  0xc000  "cpqarray"      "[whatever]"

right? This may or may not fix possible collisions with other Compaq SMART
controllers.

Comment 26 Bill Nottingham 2001-03-23 02:22:46 UTC
Actually, that should be:

0x8086  0x1960  0x0e11  0xc000  "unknown"      "[whatever]"

If you look at /usr/share/kudzu/pcitable in the included-in-qa0322 kudzu
package, that line is there, correct?

Matt/Dr. Mike; are the unknown lines getting pulled into the first stage
of the installer?

Comment 27 Daniel Roesen 2001-03-23 02:47:48 UTC
Actually, I just have made a modified boot disk with pcitable entry:

0x8086  0x1960  0x0e11  0xc000  "cpqarray"  "Compaq|SMART 3200 RAID Array
Controller"

Now, "Found suggestion of megaraid" is changed to "Found suggestion of cpqarray"
and the installer proceeds, although something still insmods megaraid.o, but
this doesn't hang the machine anymore.

As soon as the install is completely done (I'm now burning disc 2 ISO) I get
back regarding the kudzu RPM pcitable.

Comment 28 Daniel Roesen 2001-03-23 02:57:35 UTC
BTW: is a standard Server class installation with no sub-functionalities
selected supposed to need disc 2?

Comment 29 Daniel Roesen 2001-03-23 03:27:20 UTC
OK. Installing finished successfully ("Congratulations..."), but after
"rebooting system..." the box hang (console switching works, but nothing more).
Had to powercycle.

After first booting into the system the fun continues:

[root@alexis /root]# fgrep hostname /var/log/boot.log 
Mar 23 04:14:36 alexis rc.sysinit: Setting hostname localhost.localdomain: 
succeeded 
[root@alexis /root]# fgrep HOSTNAME /etc/sysconfig/network
HOSTNAME=localhost.localdomain

Although I entered a complete FQDN, static IP config and the box _should_ have
had working connectivity to it's nameserver (TLAN driver got loaded, IP config
still works) while installing. Especially interesting is that the shell prompt
contains the right host part of the FQDN. Weird. I guess after the first normal
reboot it reverts to "localhost" because of /etc/sysconfig/network.

Next fun:

When using ping, I get the following:
Warning: time of day goes back, taking countermeasures.

I can't see the system clock going backward... No, NTP is not running - not even
installed.

Regarding /usr/share/kudzu/pcitable:

# fgrep 0xc000 /usr/share/kudzu/pcitable 
0x8086	0x1960	0x0e11	0xc000	"unknown"	"Intel Corporation|80960RP [i960RP
Microprocessor]"

and no, there are no "unknown" entries in the first-stage installer pcitable.

Comment 30 Bill Nottingham 2001-03-23 03:30:33 UTC
OK, so it's an installer bug that it's not getting pulled into the pcitable.
The other stuff is completely unrelated to the megaraid issues; please open
other bugs for those if you feel they are problems.

Comment 31 Matt Domsch 2001-03-23 03:37:32 UTC
Here are additional PCI IDs which the PERC2/SC card could report as.  Each of 
these only occur on cards with old buggy firmware which the driver detects and 
requires upgrading before using the card, but they still need to be detected.
-Matt


From: Doug Gurney [dougg]
Sent: Thursday, March 22, 2001 7:05 PM
To: Peter Jarrett; 'Tesfamariam_Michael'
Cc: 'Matt_Domsch'
Subject: RE: AMI Card


The information looks correct.  There were some additional subsystem ids for
the PERC2/SC, as follows, but you have it correct for the latest firmware
released.

Dell 466 (prior to v2.10)	101E	09A0	
Dell 466 (v2.10 to v3.00)	1111	1111	
Dell 466 (v3.01 and later)	1028	1111	



Comment 32 Bill Nottingham 2001-03-23 04:03:00 UTC
But these will still get picked up as long as the generic i960 is mapped
to megaraid, yes?

Comment 33 Daniel Roesen 2001-03-23 04:11:36 UTC
There is a problem with the "unknown" mapping: cpqarray driver doesn't get
loaded so installing from harddisk is not possible (all disks are connected to
the SMART controller), because the cpqarray driver (and megaraid) gets loaded
only right _after_ install source selection.

Mapping the PCI device to cpqarray fixes that.

Comment 34 Daniel Roesen 2001-03-23 04:19:04 UTC
This happens only if I boot from the floppy without having the CDROM in the
drive (so CDROM mount fails). If CDROM mount succeeds, the cpqarray driver gets
loaded.

Comment 35 Bill Nottingham 2001-03-23 04:21:00 UTC
That doesn't make sense, since you already have something mapped to cpqarray
in the machine.
I'd say the reason it wouldn't load until after install source selection is because
the module isn't on the boot disk.

Comment 36 Daniel Roesen 2001-03-23 04:23:13 UTC
Of course. Sigh. Need coffee. 5:23am local time.

Comment 37 Matt Domsch 2001-03-23 04:34:58 UTC
> But these will still get picked up as long as the generic 
> i960 is mapped to megaraid, yes?

Yes.  The vendor/device IDs are i960.  Alan thought it proper to have the 
driver search by vendor/device/subvendor/subdevice, so those are some extra 
subsystem numbers to search for under the i960 generic vendor/device.

So, we're solving two related problems.
1) kudzu shouldn't try loading the megaraid driver on the Compaq controller 
anymore.  This is now fixed in your kudzu tree.
2) megaraid shouldn't hang the system while reading config space bytes on the 
Compaq controller.  This still needs to be fixed, and requires the PCI ID table 
we've slowly built up in this bugzilla entry.


Comment 38 Daniel Roesen 2001-03-28 19:01:24 UTC
Problem still persistent with qa0327. Same tty3 output like the attachment on
this bug.

Comment 39 Bill Nottingham 2001-03-28 20:43:33 UTC
Matt/Dr. Mike - are the unknown entries getting stripped from the
pcitable used by the installer?

Comment 40 Daniel Roesen 2001-04-04 15:26:37 UTC
Still no "unknown" PCI entries in modules/pcitable on qa0401 boot.img initrd.

Comment 41 Matt Wilson 2001-04-04 18:29:09 UTC
fixing.


Comment 42 Matt Wilson 2001-04-04 18:33:45 UTC
fixed trimpcitable to not nuke unknown entries.


Comment 43 Daniel Roesen 2001-04-05 11:15:56 UTC
OK, is the just released qa0404 fixed then?

Comment 44 Matt Wilson 2001-04-05 13:32:43 UTC
nope, not fixed until qa0405.


Comment 45 Daniel Roesen 2001-04-30 15:06:26 UTC
OK, 7.1 gold installs perfectly on my test machine. Fix confirmed :-)

Comment 46 Matt Wilson 2001-04-30 15:32:41 UTC
great, thanks.



Note You need to log in before you can comment on or make changes to this bug.