Description of problem: With a RHEL5.1 install on any poweredge server and an emulex card(which uses the lpfc driver), I/O error are seen on the lun that is not assigned. The Naviagent service was installed on the server, switch was zoned, storage groups formed on the storage side and the host server was connected through navisphere console to the storage group formed, but the LUNs were not presented to the OS. On rebooting the system I/O errors were observed in /var/log/messages. Version-Release number of selected component (if applicable): RHEL5.1 kernel-2.6.18-52.EL5 How reproducible: Often. Steps to Reproduce: 1. Install RHEL5.1 with emulex card. 2. Connect the lp card to the CX storage box with a zoned switch connection. 3. Install naviagent on the installed system. 4. Form storage groups on the storage and connect them to the host server through the naviagent service console but do not assign luns to the host server. 4. Reboot the system with the fibre channel connectivity. Actual results: 1) I/O errors are seen on the storage lun which has not yet been assigned. Expected results: 1) No I/O errors should be seen. Additional info: 1) This issue has also been seen with RHEL-5 gold release i.e. kernel-2.6.18- 8.el5
Created attachment 237251 [details] /var/log/messages file /dev/sdb is the FC storage lun which has not been assigned to the host server but still throws I/O errors.
I'm not entirely sure what Naviagent is, but from the sounds of the problem and the messages in the log file, it certainly looks like a race condition in the Naviagent software. The Emulex driver is actually working properly from what I can see. It is getting an async notification of a new device on the fabric (when the fabric came up, the device was there) and it adds the device to the SCSI layer, the SCSI layer successfully gets an INQUIRY through to the device, then it starts getting failures when it attempts to send the remaining commands it normally sends during device scan (READ_CAPACITY and so on). Based on what I've seen, and a rather limited knowledge of Naviagent, I would guess that once the Naviagent software is brought up, it is possibly adding the devices, then realizing they aren't exported to this machine and removing them, or something like that. In the meantime, sometimes the Emulex driver notices the device between the add/remove, and sometimes it doesn't, resulting in what you see. In order to be any more help than this, I would need to know more about the Naviagent software and it's role in device discovery (or alternatively, someone inside Red Hat that knows more about it would have to take over for me...Cc:ing Tom Coughlan since he might know if someone else is knowledgeable in Naviagent setups).
Thanks for looking at this Doug. I'll ask Wayne at EMC to take a look. The errors are on LUNZ. This is fake LUN that provides a path for in-band comunication with the Clariion controller: Oct 17 19:44:31 aknode5 kernel: lpfc 0000:04:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 Oct 17 19:44:31 aknode5 kernel: Vendor: DGC Model: LUNZ Rev: 0322 Oct 17 19:44:31 aknode5 kernel: Type: Direct-Access ANSI SCSI revision: 04 Oct 17 19:44:31 aknode5 kernel: sdb : READ CAPACITY failed. Oct 17 19:44:31 aknode5 kernel: sdb : status=1, message=00, host=0, driver=08 Oct 17 19:44:31 aknode5 kernel: sd: Current: sense key: Illegal Request Oct 17 19:44:31 aknode5 kernel: Add. Sense: Logical unit not supported This happens when the WWID of the HBA port is not properly registered with the Clariion. This may also happen when there are no LUNs assigned. Wayne?
Hey Wayne, any updates to this? Thanks!
This is expected behavior. As Tom pointed out, this is a fake LUN used for in- band communications via sg() between the host (Naviagent) and the array. Once LUNs are assigned tothe storage group on teh array teh LUNZ will no longer be seen.