Bug 205807
Summary: | cpqarray disks not being detected on Compaq DL360 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Post <mark.post> | ||||
Component: | kernel | Assignee: | Chip Coldwell <coldwell> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.0 | CC: | coughlan, mike.miller | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6.18-1.2718.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-10-10 14:07:15 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mark Post
2006-09-08 16:49:43 UTC
*** Bug 205653 has been marked as a duplicate of this bug. *** This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Mike, Any idea why cpqarray stopped working in RHEL 5? We already have the patch described here: http://marc.theaimsgroup.com/?l=linux-scsi&m=115591706804045&w=2 (sym2 claims support for the cpqarray pci id) Seen or heard of a problem here? Tom Tom, This problem was reported just a few weeks ago. I suspect that it's been there for a long time but now that drivers seem to load in a different order the problem shows up. I didn't see the patch in the link and I can't find where one has submitted. mikem The patch (to sym53c8xx_2/sym_glue.c) is here: http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920 Unfortunately, this does not fix the problem on RHEL 5. I see why it doesn't fix the problem. Both controllers are storage class devices. I'm surprised that James sent this upstream. I'll have to dig up an old dl380 and see what needs to change. Hopefully I won't break the symbios driver. After looking again (and again, and again) at Grant's patch it should work. The patch is not in rhel5 beta1. Can the reporter apply that patch and try again? Still haven't dug up that old platform. :( Exactly what are you asking me to do? Build new kernel modules and somehow integrate them into the installer? I would have absolutely no idea how to go about that. (In reply to comment #8) > Exactly what are you asking me to do? Build new kernel modules and somehow > integrate them into the installer? I would have absolutely no idea how to go > about that. No, we don't need you to fix our installer. That's our problem. I think there is a bug that is not addressed by the upstream patch. I have a system that boots off an external SCSI drive (using an Adaptec HBA), and when I manually load the cpqarray driver I get this: Compaq SMART2 Driver (v 2.6.0) cpqarray: Device 0x10 has been found at bus 0 dev 14 func 0 cpqarray: Finding drives on ida0<4>cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned! cpqarray: error sending ID controller cpqarray: Starting firmware's background processing cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned! cpqarray: Unable to start background processing And no disk devices are found on the cpqarray. This is without the conflicting sym53c8xx module loaded at all, and with the upstream patch applied. So I think we can put this upstream patch to rest now. It does not solve our problem. I'm trying to figure out what will. Chip Odd, manually loading the driver is what works for me. Would you like me to try manually loading different things, things in a different order, etc? I have a little time to "play" today. Mark Post (In reply to comment #10) > Odd, manually loading the driver is what works for me. Would you like me to try > manually loading different things, things in a different order, etc? I have a > little time to "play" today. Sure, anything you can think of that might shed some light. I've dug around a little bit and the messages in the kernel log are coming from the getgeometry function called from cpqarray_register_ctlr which is itself called at the very end of cpqarray_init_one. Since that last function is registered as the .probe method for the pci_driver, it seems likely that the kernel *IS* identifying the cpqarray on the PCI bus, but the probe fails because the sendcmd function keeps timing out. Chip > Mark Post > Thats what I'm looking at in cpqarray.c. There are some pretty funky looking delays in the code. Not sure why it just now broke, though. mikem Some more debuginfo stuff: Compaq SMART2 Driver (v 2.6.0) cpqarray: Device 0x10 has been found at bus 0 dev 14 func 0 vendor_id = 1000 device_id = 10 command = 153 addr[0] = 2400 addr[1] = f6000000 addr[2] = f5000000 addr[3] = 0 addr[4] = 0 addr[5] = 0 revision = 1 irq = c1 cache_line_size = 8 latency_timer = c0 board_id = 40400e11 cpqarray: Finding drives on ida0<4>cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned! cpqarray: error sending ID controller cpqarray: Starting firmware's background processing cpqarray ida0: idaSendPciCmd Timeout out, No command list address returned! cpqarray: Unable to start background processing ida_open ida/c0d0 board_id = 40400e11 product number 10 c->product_name = Integrated Array So it looks like it's using the smart4_access methods. Chip I just rebooted the installer, then manually inserted the sym53c8xx module. The messages that came out of the kernel were these: <6>ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 19 (level, low) -> IRQ 193 sym53c8xx 0000:00:01.0: device not supported ACPI: PCI Interrupt for device 0000:00:01.0 disabled. Then, when I selected the cpqarray module, I got the same timeout errors as before. What was very interesting, however, was that it was trying to activate device 0000:00:01.0 also, on IRQ 193. But, the kernel had just disabled that. So, no big surprise the cpqarray driver is getting timeouts. Mark Post (In reply to comment #15) > What was very interesting, however, was that it was trying to activate > device 0000:00:01.0 also, on IRQ 193. But, the kernel had just disabled that. > So, no big surprise the cpqarray driver is getting timeouts. The timeouts I'm seeing are while the driver is polling, with interrupts disabled. So the lack of interrupts is not the cause of the timeout. Chip I ran another test. I booted with "linux text nostorage" and inserted the cpqarray first. The output from dmesg looked normal, and my disks were found. I then selected the sym53c8xx module, and looked at dmesg. Nothing had changed at all. My disks were still accessible, and I could proceed with the install. Mark Post I am also see this bug on a proliant 8500. FC6 Test 3 installed just fine. -Heath Petty Created attachment 138135 [details]
This is the upstream patch that fixes the problem.
Looks like we picked up the upstream patch. Closing out the bug. |