From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040614 Firefox/0.8 Description of problem: Hi, I have an HP Proliant DL740 server, with two FC2214 (QLA2312) host bus adapters, connected to a SAN with an HP EVA3000 disk-array. Trying latest kernel update from RH (2.4.21-15.0.3.ELhugemem), containing new QLogic driver (v.6.07.02-RH2-fo). After reboot, I'm unable to see existing LUNs. Attached, there are full dmesg output and /etc/modules.conf Version-Release number of selected component (if applicable): 2.4.21-15.0.3.ELhugemem How reproducible: Always Steps to Reproduce: 1. Install new kernel with 'rpm -ivh kernel-hugemem-2.4.21-15.0.3.EL.i686.rpm' 2. Configure /etc/modules.conf to load qla2300 and qla2300_conf modules 3. Check dmesg output Actual Results: errors in dmesg output, unable to access to existing LUNs Expected Results: no errors in dmesg output, access to LUNs with 'fdisk /dev/sd...' or 'mount' them Additional info: This the relevant part of dmesg output: ################################################ scsi0 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 7 device 1 irq 31 Firmware version: 3.02.24, Driver version 6.07.02-RH2-fo scsi1 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 7 device 2 irq 29 Firmware version: 3.02.24, Driver version 6.07.02-RH2-fo blk: queue ef7d9a18, I/O limit 4294967295Mb (mask 0xffffffffffffffff) scsi: unknown type 12 Vendor: HP Model: HSV100 Rev: 3010 Type: Unknown ANSI SCSI revision: 02 blk: queue ef7d9818, I/O limit 4294967295Mb (mask 0xffffffffffffffff) Vendor: HP Model: HSV100 Rev: 3010 Type: Direct-Access ANSI SCSI revision: 02 blk: queue ef7d9618, I/O limit 4294967295Mb (mask 0xffffffffffffffff) Vendor: HP Model: HSV100 Rev: 3010 Type: Direct-Access ANSI SCSI revision: 02 blk: queue ef7d9418, I/O limit 4294967295Mb (mask 0xffffffffffffffff) scsi(0:0:0:0): Enabled tagged queuing, queue depth 16. scsi(0:0:0:1): Enabled tagged queuing, queue depth 16. scsi(0:0:0:2): Enabled tagged queuing, queue depth 16. Attached scsi disk sda at scsi0, channel 0, id 0, lun 1 Attached scsi disk sdb at scsi0, channel 0, id 0, lun 2 resize_dma_pool: unknown device type 12 SCSI device sda: 41943040 512-byte hdwr sectors (21475 MB) sda: sda1 SCSI device sdb: 419430400 512-byte hdwr sectors (214748 MB) sdb:<6>Device 08:10 not ready. I/O error: dev 08:10, sector 0 Device 08:10 not ready. I/O error: dev 08:10, sector 0 unable to read partition table ###################################################
Created attachment 101716 [details] full dmesg output
Created attachment 101717 [details] Content of /etc/modules.conf
The log above indicates that LUN 1 is configured as sda without error, and LUN 2 is configured as sdb, but it gets a not ready error. Are you able to access sda without any problem? Were there supposed to be more than two disk LUNs configured? Can you see the LUNs in the QLogic BIOS, and do I/O to them using the BIOS "Fibre Disk Utility"? Unless I am mistaken, the HSV storage device has two controllers, and a logical unit can be on-line to one controller at a time. If you try to do I/O to a logical unit on the wrong controller, you will get a "not ready" error. Is it possible that LUN 2 is not on-line to the path you are doing I/O to? Can you try the QLogic driver without the load parameters, to see if you get the same result in the basic non-failover mode?
No, I'm unable to access LUN1 (sda), for instance: # fdisk /dev/sda Unable to open /dev/sda It was supposed to be two disks configured for this host. If I use driver 7.00.03-fo from qlogic.com, I see two LUNS sda and sdb. If I run 'insmod qla2300' without parameters, I see a long list of devices (sda to sdh), probably because multipath, and I'm able to access two LUNS using: - sda (first LUN) - sdd (second LUN) - sde (first LUN) - sdh (second LUN) Does problem lies in failover/multpath?
Well, it looks like - paths sda and sde are connected to the HSV controller that has logical unit one on-line to it, and - paths sdd and sdh are connected to the HSV controller that has logical unit two on-line to it. The other paths lead to the HSV controller that the logical unit is not currently on-line to. You can test this by setting the unit online to the other HSV controller. Anyway, it does look like a multipath configuration problem to me. Please confirm.
You says: "You can test this by setting the unit online to the other HSV controller." I don't understand well how can I test this? I'd like to avoid physical changes but if needed... In any case, I have another server with identical setup except for driver 7.00.03-fo from qlogic.com and I see only two disks (using ql2xfailover=1). Also IMHO it is a problem of this driver with multipath of HP EVA3000: do you know if is multipath/failover supported in this driver version with this disk-array? Next step?
You can enter commands at the HSV management console to move the logical unit from one controller to the other. You don't need to make any physical changes. This additional level of verification is not really necessary, though, since it is already clear that we are seeing the effects of the HSV's asymmetric path access. Red Hat does not test or support the Qlogic driver's multipath feature. You need to get QLogic to help with that. The revision.notes for the 7.00.03 version of the driver does indicate a number of changes that were done to support failover on the StorageWorks arrays. We are planing to include the 7.00.03 driver in RHEL 3 U3. It would be best to ask QLogic whether any earlier version of the driver works in multipath mode with the HSV.
I'm putting this bug back into MODIFIED state. The upgrade of the QLogic driver to 7.00.03 driver in RHEL 3 U3 has already been committed. When the U3 advisory becomes available to customers, this bug will automatically be put into CLOSED/ERRATA state by the Errata System.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html