From Bugzilla Helper: User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.17-14smp i686) Description of problem: Unable to open() an sg node associated with a disconnected LUN (PQ/PDType 001/00000). /var/log/messages says it is unable to read the partition table. In RH 6.2 (lk 2.2.16-3 and 2.2.14-12) we were able to open the LUN. RH 6.2 - /var/log/messages would report it was unable to read the partition table and READ CAPACITY FAILED; however, would assign default blocksize 512 and disk size 1GB. RH 7.0 - /proc/scsi/sg/debug has an entry for the device, but no data. Whereas, RH 6.2 would have both an entry and data in /proc/scsi/sg/debug. Further, the scsi targets (i.e 3:0:0:0) are assigned an sd device node in both RH 6.2 and 7.0; however, 7.0 has no device listed in /proc/partitions. The disconnected LUN is reported by the disk array's system process. The array is identified by the SCSI INQUIRY as a DGC RAID ANSI 4 device. Without being able to open() the sg device node the management software, EMC's Navisphere, is unable to manage the array. This places you in a quandry if the array has no data LUNs existing on both of its system processors to start with. Prior to RH 7.0 the EMC CLARiiON disk array was not listed in the SCSI scan BLIST. It is a sparse LUN device so it does belong there and I don't believe it is the reason the problem exist. The issue is although a device node is assigned by both sd() and sg() you cannot open() the device. How reproducible: Always Steps to Reproduce: 1.Load both the clariion-attach and navisphere RPM packages, Ican supply these or you can obatin them from EMC Tech Support 2.Install the driver and Navisphere per EMC documentation 3.modprobe qla2x00smp (if SMP system) or modprobe qla2x00 (non-SMP) 4./etc/rc.d/init.d/naviagent start 5.Examine /proc/scsi/sg/debug for the sg devices associated with the the SCSI targets assigned whne the Qlogic QLA2x00 driver was initialized, there should be several lines of information. If there is a single line you will be unable to open the device. strace() will verify you were unable to open the device. Actual Results: strace() shows open() failed with an ENXIO error - No such device or address Expected Results: strace() shows open() succeeded in openning the the /dev/sg<alpha device name> as O_RDONLY and the /dev/sg<numeric name> as O_RDONLY; O_RDWR | O_NONBLOCK; and O_RDONLY | O_NONBLOCK. Additional info: /usr/src/linux/drivers/scsi/sg.c is 3.00.10 which the EMC Navisphere software is linked to during compilization. /var/log/messages: Jul 18 13:50:01 Linux88 kernel: (scsi): Found a QLA2300 @ bus 0, device 0x9, irq 19, iobase 0x2000 Jul 18 13:50:01 Linux88 kernel: scsi(2): Configure NVRAM parameters... Jul 18 13:50:01 Linux88 kernel: scsi(2): Verifying loaded RISC code... Jul 18 13:50:01 Linux88 kernel: scsi(2): Verifying chip... Jul 18 13:50:01 Linux88 kernel: scsi(2): Waiting for LIP to complete... Jul 18 13:50:01 Linux88 CROND[931]: (root) CMD ( /sbin/rmmod -as) Jul 18 13:50:30 Linux88 kernel: scsi(2): LOOP UP detected Jul 18 13:50:30 Linux88 kernel: scsi(2): Waiting for LIP to complete... Jul 18 13:50:30 Linux88 kernel: scsi2: Topology - (F_Port), Host Loop address 0xffff Jul 18 13:50:30 Linux88 kernel: qla2100: Performing ISP error recovery - ha= bf440078 Jul 18 13:50:30 Linux88 kernel: scsi(2): Waiting for LIP to complete... Jul 18 13:50:30 Linux88 kernel: scsi(2): Waiting for LIP to complete... Jul 18 13:50:30 Linux88 kernel: qla2100_configure_hba: [ERROR] Get host loop ID failed Jul 18 13:50:30 Linux88 kernel: scsi-qla0-adapter-node=200000e08b04cec4; Jul 18 13:50:30 Linux88 kernel: scsi-qla0-adapter-port=210000e08b04cec4; Jul 18 13:50:30 Linux88 kernel: scsi-qla0-target-0=500601608802b398; Jul 18 13:50:30 Linux88 kernel: (scsi): Found a QLA2300 @ bus 0, device 0xb, irq 18, iobase 0x2400 Jul 18 13:50:30 Linux88 kernel: scsi(3): Configure NVRAM parameters... Jul 18 13:50:35 Linux88 kernel: scsi(2): LOOP UP detected Jul 18 13:50:35 Linux88 kernel: scsi(3): Verifying loaded RISC code... Jul 18 13:50:35 Linux88 kernel: scsi(3): Verifying chip... Jul 18 13:50:35 Linux88 kernel: scsi(3): Waiting for LIP to complete... Jul 18 13:50:35 Linux88 kernel: scsi(3): LOOP UP detected Jul 18 13:50:35 Linux88 kernel: scsi3: Topology - (F_Port), Host Loop address 0xffff Jul 18 13:50:35 Linux88 kernel: scsi(2): Waiting for LIP to complete... Jul 18 13:50:35 Linux88 kernel: scsi2: Topology - (F_Port), Host Loop address 0xffff Jul 18 13:50:35 Linux88 kernel: scsi(3): Waiting for LIP to complete... Jul 18 13:50:36 Linux88 kernel: scsi3: Topology - (F_Port), Host Loop address 0xffff Jul 18 13:50:36 Linux88 kernel: scsi-qla1-adapter-node=200000e08b04cfc4; Jul 18 13:50:36 Linux88 kernel: scsi-qla1-adapter-port=210000e08b04cfc4; Jul 18 13:50:36 Linux88 kernel: scsi-qla1-target-0=500601688802b398; Jul 18 13:50:36 Linux88 kernel: scsi2 : QLogic QLA2300 PCI to Fibre Channel Host Adapter: bus 0 device 9 irq 19 Jul 18 13:50:36 Linux88 kernel: Firmware version: 3.00.23, Driver version 4.33b Jul 18 13:50:36 Linux88 kernel: scsi3 : QLogic QLA2300 PCI to Fibre Channel Host Adapter: bus 0 device 11 irq 18 Jul 18 13:50:36 Linux88 kernel: Firmware version: 3.00.23, Driver version 4.33b Jul 18 13:50:36 Linux88 kernel: scsi : 4 hosts. Jul 18 13:50:36 Linux88 kernel: Vendor: DGC Model: Rev: 0524 Jul 18 13:50:36 Linux88 kernel: Type: Direct-Access ANSI SCSI revision: 04 Jul 18 13:50:36 Linux88 kernel: Detected scsi disk sdd at scsi2, channel 0, id 0, lun 0 Jul 18 13:50:36 Linux88 kernel: scsi(2:0:0:0): Enabled tagged queuing, queue depth 16. Jul 18 13:50:36 Linux88 kernel: Vendor: DGC Model: Rev: 0524 Jul 18 13:50:36 Linux88 kernel: Type: Direct-Access ANSI SCSI revision: 04 Jul 18 13:50:36 Linux88 kernel: Detected scsi disk sde at scsi3, channel 0, id 0, lun 0 Jul 18 13:50:36 Linux88 kernel: scsi(3:0:0:0): Enabled tagged queuing, queue depth 16. Jul 18 13:50:36 Linux88 kernel: sdd:scsidisk I/O error: dev 08:30, sector 0 Jul 18 13:50:36 Linux88 kernel: unable to read partition table Jul 18 13:50:36 Linux88 kernel: sde:scsidisk I/O error: dev 08:40, sector 0 Jul 18 13:50:36 Linux88 kernel: unable to read partition table strace() of navisphere start: 1060 [2abd5354] open("/dev/sg4", O_RDONLY) = -1 ENXIO (No such device or address) <0.000018> 1060 [2abd5354] open("/dev/sge", O_RDONLY) = -1 ENXIO (No such device or address) <0.000013> 1060 [2abd5354] open("/dev/sg5", O_RDONLY) = -1 ENXIO (No such device or address) <0.000015> 1060 [2abd5354] open("/dev/sgf", O_RDONLY) = -1 ENXIO (No such device or address) <0.000011> (The two disconnected LUNs - SPa and SPb of the array) [root@Linux88 /root]# cat /proc/scsi/sg/debug dev_max=57 max_active_device=6 (origin 1) scsi_dma_free_sectors=144 sg_pool_secs_aval=320 def_reserved_size=32768 >>> device=0(sga) scsi0 chan=0 id=2 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=1(sgb) scsi0 chan=0 id=3 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=2(sgc) scsi0 chan=0 id=4 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=3(sgd) scsi0 chan=0 id=9 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6000 bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=4(sge) scsi2 chan=0 id=0 lun=0 em=0 sg_tablesize=32 excl=0 >>> device=5(sgf) scsi3 chan=0 id=0 lun=0 em=0 sg_tablesize=32 excl=0 [root@Linux88 /root]# (sge and sgf are the disconnected LUNs.)
Have just completed testing on RH6.2 lk 2.2.16-3 and a disconnected LUN can be openned by sg(). Tested RH7.0 lk 2.2.16-22, 2.2.17-14, and 2.2.19-7.0.1 and they all fail when sg tries to open the disconnected LUN. I further tested RH7.1 lk 2.4.2-2 and was unable to open the disconnected LUN. All failures were the same as above. I still suspect the change that is causing the problem occured in the SCSI midlayer used in RH7.0 and 7.1. We've turned on SCSI logging in hopes of gathering further information but can't seem to figure out how to get useful information out of the debugging information. We're using the scan token believing the problem exist somewhere in this area of the code. One of the problems we're encounteing with SCSI logging is we have multiple Qlogic QLA/2200FC HBA's in the system so the information that is pushed to /var/log/messages from one HBA gets step on by the other HBA so it is not complete and, at times, isn't intelligible. I hope this additional information we provide further insight into the problem.
Doug: any ideas ?
Yeah, I'm pretty sure what the problem is, and what patch exactly caused it. The linux-2.4.2-scsi_scan.patch in the 2.4 kernel RPM is the cause of the problem. However, it went in specifically to solve another problem (some device report lots of offline drives in the sparse space, including the Clarrion arrays that Wayne is using, so that if you don't include this patch, you end up with 254 offline entries in the SCSI device list on some arrays). In short, it's an inconsistent usage of the offline status in the SCSI Inquiry data that is causing this problem and I don't see any good answer. With the patch you have problems, and without the patch you have problems. My preferred choice is to leave the patch and make configuration tools go through whatever device is at LUN0 on the chassis for proper configuration, but I don't know enough about the current setup Wayne is using to say if that's possible.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/