Bug 205807

Summary: cpqarray disks not being detected on Compaq DL360
Product: Red Hat Enterprise Linux 5 Reporter: Mark Post <mark.post>
Component: kernelAssignee: Chip Coldwell <coldwell>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: coughlan, mike.miller
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.18-1.2718.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-10 14:07:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
This is the upstream patch that fixes the problem. none

Description Mark Post 2006-09-08 16:49:43 UTC
Description of problem:
During the installation process, hardware detection is not finding the RAID
controller, and hence not finding the SCSI disks in the system.

Version-Release number of selected component (if applicable):
RHEL5 Beta1

How reproducible:
100% reproducible

Steps to Reproduce:
1. Boot from CD
2. Step through language and keyboard selection screens
3. After the stage 2 file is loaded and Anaconda starts a dialog box is displayed:
No hard drives have been found.  You probably need to manually choose device
drivers for the installation to suceed.  Would you like to select drivers now?
4. Select Yes
5. Select Add Device
6. Select Compaq Smart/2 RAID Controller (cpqarray)
7. After driver is loaded, switch to virtual console 2, run dmesg

Actual results:
<4>cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned
<6>cpqarray: error sending ID controller
<7>cpqarray: Starting firmware's background processing
<4>cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned
<4>cpqarray: Unable to start background processing

Expected results:
Normal driver startup messages

Additional info:
This system works just fine with RHEL4, SLES10, etc.

Comment 1 Tom Coughlan 2006-09-11 20:42:36 UTC
*** Bug 205653 has been marked as a duplicate of this bug. ***

Comment 2 RHEL Program Management 2006-09-11 20:48:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Tom Coughlan 2006-09-12 19:24:40 UTC
Mike,

Any idea why cpqarray stopped working in RHEL 5? 

We already have the patch described here:

http://marc.theaimsgroup.com/?l=linux-scsi&m=115591706804045&w=2
(sym2 claims support for the cpqarray pci id)

Seen or heard of a problem here?

Tom

Comment 4 Mike Miller (OS Dev) 2006-09-12 21:00:20 UTC
Tom,
This problem was reported just a few weeks ago. I suspect that it's been there
for a long time but now that drivers seem to load in a different order the
problem shows up. I didn't see the patch in the link and I can't find where one
has submitted.

mikem

Comment 5 Tom Coughlan 2006-09-13 15:13:47 UTC
The patch (to sym53c8xx_2/sym_glue.c) is here:

http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920

Unfortunately, this does not fix the problem on RHEL 5. 


Comment 6 Mike Miller (OS Dev) 2006-09-13 16:13:29 UTC
I see why it doesn't fix the problem. Both controllers are storage class
devices. I'm surprised that James sent this upstream. I'll have to dig up an old
dl380 and see what needs to change. Hopefully I won't break the symbios driver.

Comment 7 Mike Miller (OS Dev) 2006-09-13 18:54:58 UTC
After looking again (and again, and again) at Grant's patch it should work. The
patch is not in rhel5 beta1. Can the reporter apply that patch and try again?
Still haven't dug up that old platform. :(

Comment 8 Mark Post 2006-09-13 18:58:52 UTC
Exactly what are you asking me to do?  Build new kernel modules and somehow
integrate them into the installer?  I would have absolutely no idea how to go
about that.

Comment 9 Chip Coldwell 2006-09-13 19:13:00 UTC
(In reply to comment #8)
> Exactly what are you asking me to do?  Build new kernel modules and somehow
> integrate them into the installer?  I would have absolutely no idea how to go
> about that.

No, we don't need you to fix our installer.  That's our problem.

I think there is a bug that is not addressed by the upstream patch.  I have a
system that boots off an external SCSI drive (using an Adaptec HBA), and when
I manually load the cpqarray driver I get this:

Compaq SMART2 Driver (v 2.6.0)
cpqarray: Device 0x10 has been found at bus 0 dev 14 func 0
cpqarray: Finding drives on ida0<4>cpqarray ida0: idaSendPciCmd Timeout out, No
command list address returned!
cpqarray: error sending ID controller
cpqarray: Starting firmware's background processing
cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned!
cpqarray: Unable to start background processing

And no disk devices are found on the cpqarray.  This is without the conflicting
sym53c8xx module loaded at all, and with the upstream patch applied.

So I think we can put this upstream patch to rest now.  It does not solve our
problem.  I'm trying to figure out what will.

Chip


Comment 10 Mark Post 2006-09-13 19:30:50 UTC
Odd, manually loading the driver is what works for me.  Would you like me to try
manually loading different things, things in a different order, etc?  I have a
little time to "play" today.

Mark Post


Comment 11 Chip Coldwell 2006-09-13 19:35:42 UTC
(In reply to comment #10)
> Odd, manually loading the driver is what works for me.  Would you like me to try
> manually loading different things, things in a different order, etc?  I have a
> little time to "play" today.

Sure, anything you can think of that might shed some light.

I've dug around a little bit and the messages in the kernel log are coming
from the getgeometry function called from cpqarray_register_ctlr which is
itself called at the very end of cpqarray_init_one.  Since that last function
is registered as the .probe method for the pci_driver, it seems likely that
the kernel *IS* identifying the cpqarray on the PCI bus, but the probe
fails because the sendcmd function keeps timing out.

Chip


> Mark Post
> 



Comment 12 Mike Miller (OS Dev) 2006-09-13 19:39:36 UTC
Thats what I'm looking at in cpqarray.c. There are some pretty funky looking
delays in the code. Not sure why it just now broke, though.

mikem

Comment 13 Chip Coldwell 2006-09-13 19:42:50 UTC
Some more debuginfo stuff:

Compaq SMART2 Driver (v 2.6.0)
cpqarray: Device 0x10 has been found at bus 0 dev 14 func 0
vendor_id = 1000
device_id = 10
command = 153
addr[0] = 2400
addr[1] = f6000000
addr[2] = f5000000
addr[3] = 0
addr[4] = 0
addr[5] = 0
revision = 1
irq = c1
cache_line_size = 8
latency_timer = c0
board_id = 40400e11
cpqarray: Finding drives on ida0<4>cpqarray ida0: idaSendPciCmd Timeout out, No 
command list address returned!
cpqarray: error sending ID controller
cpqarray: Starting firmware's background processing
cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned!
cpqarray: Unable to start background processing
ida_open ida/c0d0

Comment 14 Chip Coldwell 2006-09-13 19:48:35 UTC
board_id = 40400e11
product number 10
c->product_name = Integrated Array

So it looks like it's using the smart4_access methods.

Chip


Comment 15 Mark Post 2006-09-13 19:53:00 UTC
I just rebooted the installer, then manually inserted the sym53c8xx module.  The
messages that came out of the kernel were these:
<6>ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 19 (level, low) -> IRQ 193
sym53c8xx 0000:00:01.0: device not supported
ACPI: PCI Interrupt for device 0000:00:01.0 disabled.

Then, when I selected the cpqarray module, I got the same timeout errors as
before.  What was very interesting, however, was that it was trying to activate
device 0000:00:01.0 also, on IRQ 193.  But, the kernel had just disabled that. 
So, no big surprise the cpqarray driver is getting timeouts.


Mark Post


Comment 16 Chip Coldwell 2006-09-13 19:57:00 UTC
(In reply to comment #15)
> What was very interesting, however, was that it was trying to activate
> device 0000:00:01.0 also, on IRQ 193.  But, the kernel had just disabled that. 
> So, no big surprise the cpqarray driver is getting timeouts.

The timeouts I'm seeing are while the driver is polling, with interrupts
disabled.  So the lack of interrupts is not the cause of the timeout.

Chip



Comment 17 Mark Post 2006-09-13 20:15:44 UTC
I ran another test.  I booted with "linux text nostorage" and inserted the
cpqarray first.  The output from dmesg looked normal, and my disks were found. 
I then selected the sym53c8xx module, and looked at dmesg.  Nothing had changed
at all.  My disks were still accessible, and I could proceed with the install.


Mark Post


Comment 18 Heath Petty 2006-09-18 15:05:08 UTC
I am also see this bug on a proliant 8500. FC6 Test 3 installed just fine. 

-Heath Petty

Comment 20 Chip Coldwell 2006-10-10 14:05:44 UTC
Created attachment 138135 [details]
This is the upstream patch that fixes the problem.

Comment 21 Chip Coldwell 2006-10-10 14:07:15 UTC
Looks like we picked up the upstream patch.  Closing out the bug.