Bug 352001 - I/O errors are thrown on FC storage lun not assigned to the host server.
Summary: I/O errors are thrown on FC storage lun not assigned to the host server.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Tom Coughlan
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 217106
TreeView+ depends on / blocked
 
Reported: 2007-10-25 10:17 UTC by Shyam kumar Iyer
Modified: 2007-12-19 22:57 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-19 22:57:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/log/messages file (227.05 KB, text/plain)
2007-10-25 10:42 UTC, Shyam kumar Iyer
no flags Details

Description Shyam kumar Iyer 2007-10-25 10:17:22 UTC
Description of problem:
With a RHEL5.1 install on any poweredge server and an emulex card(which uses 
the lpfc driver), I/O error are seen on the lun that is not assigned.

The Naviagent service was installed on the server, switch was zoned, storage 
groups formed on the storage side and the host server was connected through 
navisphere console to the storage group formed, but the LUNs were not 
presented to the OS.

On rebooting the system I/O errors were observed in /var/log/messages.

Version-Release number of selected component (if applicable):
RHEL5.1 kernel-2.6.18-52.EL5

How reproducible:
Often.

Steps to Reproduce:
1. Install RHEL5.1 with emulex card.
2. Connect the lp card to the CX storage box with a zoned switch connection.
3. Install naviagent on the installed system.
4. Form storage groups on the storage and connect them to the host server 
through the naviagent service console but do not assign luns to the host 
server.
4. Reboot the system with the fibre channel connectivity.

  
Actual results:
1) I/O errors are seen on the storage lun which has not yet been assigned.

Expected results:
1) No I/O errors should be seen.

Additional info:
1) This issue has also been seen with RHEL-5 gold release i.e. kernel-2.6.18-
8.el5

Comment 1 Shyam kumar Iyer 2007-10-25 10:42:22 UTC
Created attachment 237251 [details]
/var/log/messages file

/dev/sdb is the FC storage lun which has not been assigned to the host server
but still throws I/O errors.

Comment 3 Doug Ledford 2007-11-30 16:57:17 UTC
I'm not entirely sure what Naviagent is, but from the sounds of the problem and
the messages in the log file, it certainly looks like a race condition in the
Naviagent software.  The Emulex driver is actually working properly from what I
can see.  It is getting an async notification of a new device on the fabric
(when the fabric came up, the device was there) and it adds the device to the
SCSI layer, the SCSI layer successfully gets an INQUIRY through to the device,
then it starts getting failures when it attempts to send the remaining commands
it normally sends during device scan (READ_CAPACITY and so on).  Based on what
I've seen, and a rather limited knowledge of Naviagent, I would guess that once
the Naviagent software is brought up, it is possibly adding the devices, then
realizing they aren't exported to this machine and removing them, or something
like that.  In the meantime, sometimes the Emulex driver notices the device
between the add/remove, and sometimes it doesn't, resulting in what you see.

In order to be any more help than this, I would need to know more about the
Naviagent software and it's role in device discovery (or alternatively, someone
inside Red Hat that knows more about it would have to take over for me...Cc:ing
Tom Coughlan since he might know if someone else is knowledgeable in Naviagent
setups).

Comment 4 Tom Coughlan 2007-12-05 20:14:27 UTC
Thanks for looking at this Doug. I'll ask Wayne at EMC to take a look.

The errors are on LUNZ. This is fake LUN that provides a path for in-band
comunication with the Clariion controller:

Oct 17 19:44:31 aknode5 kernel: lpfc 0000:04:00.0: 0:1303 Link Up Event x1
received Data: x1 xf7 x10 x0
Oct 17 19:44:31 aknode5 kernel:   Vendor: DGC       Model: LUNZ             
Rev: 0322
Oct 17 19:44:31 aknode5 kernel:   Type:   Direct-Access                     
ANSI SCSI revision: 04
Oct 17 19:44:31 aknode5 kernel: sdb : READ CAPACITY failed.
Oct 17 19:44:31 aknode5 kernel: sdb : status=1, message=00, host=0, driver=08 
Oct 17 19:44:31 aknode5 kernel: sd: Current: sense key: Illegal Request
Oct 17 19:44:31 aknode5 kernel:     Add. Sense: Logical unit not supported

This happens when the WWID of the HBA port is not properly registered with the
Clariion. This may also happen when there are no LUNs assigned. 

Wayne?

Comment 5 Andrius Benokraitis 2007-12-11 20:08:15 UTC
Hey Wayne, any updates to this? Thanks!

Comment 6 Wayne Berthiaume 2007-12-18 19:53:52 UTC
This is expected behavior. As Tom pointed out, this is a fake LUN used for in-
band communications via sg() between the host (Naviagent) and the array. Once 
LUNs are assigned tothe storage group on teh array teh LUNZ will no longer be 
seen.


Note You need to log in before you can comment on or make changes to this bug.