RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 678275 - [NetApp 6.0 Bug] RHEL 6.0 Emulex 8G FC host hangs during fabric faults
Summary: [NetApp 6.0 Bug] RHEL 6.0 Emulex 8G FC host hangs during fabric faults
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: All
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Rob Evers
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 660278
Blocks: 5.6-Known_Issues
TreeView+ depends on / blocked
 
Reported: 2011-02-17 11:38 UTC by Rajashekhar M A
Modified: 2011-05-08 23:18 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 660278
Environment:
Last Closed: 2011-04-12 12:52:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rajashekhar M A 2011-02-17 11:38:25 UTC
+++ This bug was initially created as a clone of Bug #660278 +++


We are seeing the same behavior with a RHEL6.0 FC host with inbox drivers - v8.3.5.17.

---
Description of problem:
RHEL 5.6 root on dm-multipath Emulex FC host (using an 8G adapter) hangs during fabric faults. This invariably occurs during the 1st iteration of faults itself and is seen consistently. The only way out is then to hard boot the host.

Version-Release number of selected component (if applicable):
RHEL 5.6 Snap3 (2.6.18-233.el5)
lpfc driver v8.2.0.87
LPe12002 running fw v2.00A3 (U3D2.00A3)

How reproducible:
Always

Additional info:
Had kept lpfc heartbeat disabled due to bug 599487

--- Additional comment from andriusb on 2010-12-06 09:23:19 EST ---

Thank you for reporting this Martin, but I would imagine we need more info on this that what's there now, unless there is a known issue already by Emulex. (including them)

--- Additional comment from marting on 2010-12-06 09:52:11 EST ---

Created attachment 464989 [details]
Logs

The logs contain the sysrq dumps collected during the hang and the /var/log/messages with lpfc log verbose set to 0x1004.

The host hung approx at 16.25 and the host was hard booted at 19:22 following the hang (as seen in the messages file).

--- Additional comment from andriusb on 2010-12-07 09:12:41 EST ---

Emulex - any updates per chance?

--- Additional comment from dick.kennedy on 2010-12-07 10:56:59 EST ---

We are still evaluating the logs from netapp. If netapp could run the same test but make sure that onecommand manager is not running it would be helpful because there would be less lpfc related threads active in the kernel when it hangs.  To stop onecommand manager /usr/sbin/hbanyware/stop_ocmanager

--- Additional comment from marting on 2010-12-08 07:23:06 EST ---

Created attachment 467468 [details]
sysrq dumps & messages file after stopping the onecommand manager

Logs contain:

1) Sysrq dumps after hitting the hang, taken 2 hours apart. Collected memory, registers, current tasks & blocked tasks info.

2) messages file with lpfc log verbose set to 0x1004. The host froze at around 14:14:59 and was hardbooted at 17:09:52.

--- Additional comment from dick.kennedy on 2010-12-08 14:24:31 EST ---

I looked over the new logs, the lpfc_worker threads are sleeping on their respective events waiting for something to do. They are not holding any locks and there are no threads that are waiting for our driver locks.

The next time you run this test you can turn the lpfc logging off as there was nothing usefull in there anyway.

Would it be possible to get this to dump?
If the machine is set up to dump then you could just do the sysrq 'c' to generate a dump. 

Do you know where this message came from?
Dec  8 14:14:59 IBMx346-200-114 **NATE**: Stopping the FCP service on Filer (fas3170-201-78)

Also there used to by a syslog bug that would hang the system if you were doing too much loggin. I'll try to find the reference to it for you.

--- Additional comment from marting on 2010-12-09 13:27:19 EST ---

(In reply to comment #6)
> 
> Would it be possible to get this to dump?
> If the machine is set up to dump then you could just do the sysrq 'c' to
> generate a dump. 

Will do.

> 
> Do you know where this message came from?
> Dec  8 14:14:59 IBMx346-200-114 **NATE**: Stopping the FCP service on Filer
> (fas3170-201-78)

Those messages are logged from our automation scripts running on the host. Please ignore them.

> 
> Also there used to by a syslog bug that would hang the system if you were doing
> too much loggin. I'll try to find the reference to it for you.

Ok. 

But just an FYI - there were no problems running RHEL 5.5.z (lpfc driver v8.2.0.63.3p) on the same Emulex FC host. So it does look the lpfc driver has regressed in 5.6.

--- Additional comment from coughlan on 2010-12-09 18:15:32 EST ---

(In reply to comment #7)

> But just an FYI - there were no problems running RHEL 5.5.z (lpfc driver
> v8.2.0.63.3p) on the same Emulex FC host. So it does look the lpfc driver has
> regressed in 5.6.

Unless there is a low-risk fix for this within a day or so, this will miss 5.6.

Emulex: would we better off reverting the lpfc in 5.6 to what is 5.5.z, so we avoid this problem, or shipping as-is and fix it later? Have you tested 5.6 lpfc enough to believe this is a low impact issue? 

Tom

--- Additional comment from pm-rhel on 2010-12-09 18:19:44 EST ---

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

--- Additional comment from rajashekhar.a on 2010-12-10 08:30:47 EST ---

>> Would it be possible to get this to dump?
>> If the machine is set up to dump then you could just do the sysrq 'c' to
>> generate a dump. 

>Will do.

Here is the link to the vmcore (128MB compressed) -
ftp://ftp.netapp.com/pub/home/rajs/pub/rh-bug660278_vmcore.gz

--- Additional comment from dick.kennedy on 2010-12-10 15:11:56 EST ---

I need the /boot/System.map for that kernel, can you attach it to the bugzilla?

--- Additional comment from rajashekhar.a on 2010-12-11 15:11:13 EST ---

Created attachment 468166 [details]
System map for the vmcore

> I need the /boot/System.map for that kernel, can you attach it to the bugzilla?

Attached is the system map for the vmcore in comment 10.

--- Additional comment from revers on 2010-12-13 19:32:25 EST ---


> But just an FYI - there were no problems running RHEL 5.5.z (lpfc driver
> v8.2.0.63.3p) on the same Emulex FC host. So it does look the lpfc driver has
> regressed in 5.6.

Hi Martin,

Is this problem known to be specific to lpfc?

Would it be possible for you to try the rhel5.5z version of the driver in the rhel5.6 environment where you are seeing this problem?

Thanks, Rob

--- Additional comment from andriusb on 2010-12-13 22:00:39 EST ---

Release Candidate kernel has been spun, deferring to 5.7.

--- Additional comment from rajashekhar.a on 2010-12-14 02:45:21 EST ---

> Is this problem known to be specific to lpfc?

Yes, this is specific to lpfc and not seen on any other drivers.

> Would it be possible for you to try the rhel5.5z version...

Yes, we can try the 5.5.z version of the driver on rhel5.6 kernel. Please provide us the patch for testing.

--- Additional comment from dick.kennedy on 2010-12-14 09:24:36 EST ---

I cannot get crash to look at the core. Is this the standard rh 2.6.18-233 or did you make your own kernel?

[root@batty dk]# file /usr/lib/debug/lib/modules/2.6.18-233.el5debug/vmlinux
/usr/lib/debug/lib/modules/2.6.18-233.el5debug/vmlinux: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), statically linked, not stripd
[root@batty dk]# crash System.map-2.6.18-233.el5 /usr/lib/debug/lib/modules/2.6.18-233.el5debug/vmlinux rh-bug660278_vmcore

crash 4.1.2-7.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: questionable node_start_pfn: c085ffff50b3e8c0
WARNING: sparsemem: invalid section number: 51539517799
WARNING: sparsemem: invalid section number: 137438888108
WARNING: sparsemem: invalid section number: 137438888108
WARNING: sparsemem: invalid section number: 137438888108
please wait... (gathering module symbol data)
WARNING: invalid kernel module size: 0

crash: cannot determine idle task addresses from init_tasks[] or runqueues[]

crash: cannot resolve "init_task_union"

--- Additional comment from revers on 2010-12-14 09:48:15 EST ---

(In reply to comment #16)

> Yes, we can try the 5.5.z version of the driver on rhel5.6 kernel. Please
> provide us the patch for testing.

I can create a patch but I believe the straight-forward aproach would be for you to replace the contents of your rhel5.6 kernel/driver/lpfc directory with the contents of the rhel5.5z kernel/driver/lpfc.

If this doesn't work for you, let me know and I'll work on getting a patch together.

--- Additional comment from revers on 2010-12-15 08:55:08 EST ---

(In reply to comment #17)
> I cannot get crash to look at the core. Is this the standard rh 2.6.18-233 or
> did you make your own kernel?
> 

Thanks, Rob
(In reply to comment #18)
> (In reply to comment #16)
> I can create a patch but I believe the straight-forward aproach would be for
> you to replace the contents of your rhel5.6 kernel/driver/lpfc directory with
> the contents of the rhel5.5z kernel/driver/lpfc.
> 
> If this doesn't work for you, let me know and I'll work on getting a patch
> together.

Netapp,

Any progress on either of these?

Thanks, Rob

--- Additional comment from marting on 2010-12-15 09:02:25 EST ---

(In reply to comment #19)
> Netapp,
> 
> Any progress on either of these?
> 

Working on this. Will update the bug asap.

--- Additional comment from marting on 2010-12-15 12:36:21 EST ---

Please use the following vmcore (taken during the hang) & System.map files:

1) ftp://ftp.netapp.com/pub/home/marting/pub/rh-bz660278/vmcore.bz2

2) ftp://ftp.netapp.com/pub/home/marting/pub/rh-bz660278/System.map-2.6.18-233.el5debug

I have used the standard RHEL 5.6 Snap3 debug kernel itself here (2.6.18-233.el5debug):

# crash /boot/System.map-2.6.18-233.el5debug /usr/lib/debug/lib/modules/2.6.18-233.el5debug/vmlinux tmp/vmcore

crash 4.1.2-7.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

  SYSTEM MAP: /boot/System.map-2.6.18-233.el5debug
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.18-233.el5debug/vmlinux (2.6.18-233.el5debug)
    DUMPFILE: tmp/vmcore
        CPUS: 4
        DATE: Wed Dec 15 20:45:52 2010
      UPTIME: 00:37:11
LOAD AVERAGE: 26.91, 22.40, 13.30
       TASKS: 228
    NODENAME: IBMx346-200-114
     RELEASE: 2.6.18-233.el5debug
     VERSION: #1 SMP Mon Nov 22 18:24:20 EST 2010
     MACHINE: x86_64  (3200 Mhz)
      MEMORY: 2 GB
       PANIC: "SysRq : Trigger a crashdump"
         PID: 0
     COMMAND: "swapper"
        TASK: ffffffff80325cc0  (1 of 4)  [THREAD_INFO: ffffffff8047a000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 0      TASK: ffffffff80325cc0  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff80528d60] crash_kexec at ffffffff800b9f4b
 #1 [ffffffff80528e20] sysrq_handle_crashdump at ffffffff801c6a26
 #2 [ffffffff80528e30] __handle_sysrq at ffffffff801c6813
 #3 [ffffffff80528e70] receive_chars at ffffffff801d57f6
 #4 [ffffffff80528ec0] serial8250_interrupt at ffffffff801d6a14
 #5 [ffffffff80528f20] handle_IRQ_event at ffffffff80011781
 #6 [ffffffff80528f50] __do_IRQ at ffffffff800c781b
 #7 [ffffffff80528f90] do_IRQ at ffffffff800715e0
--- <IRQ stack> ---
 #8 [ffffffff8047bed8] ret_from_intr at ffffffff80060652
    [exception RIP: mwait_idle_with_hints+102]
    RIP: ffffffff8006f8e9  RSP: ffffffff8047bf80  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000090000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffffff804613c0   R8: 0000000000000002   R9: 0000000000000001
    R10: 0000000000000000  R11: ffffffff80065feb  R12: 0000000000000001
    R13: ffffffff80012ffa  R14: ffffffff8047a000  R15: 0000000000000001
    ORIG_RAX: fffffffffffffffb  CS: 0010  SS: 0018
 #9 [ffffffff8047bf80] mwait_idle at ffffffff80059b3b
#10 [ffffffff8047bf90] cpu_idle at ffffffff8004bad0
crash>

--- Additional comment from marting on 2010-12-16 09:18:35 EST ---

(In reply to comment #13)
> Would it be possible for you to try the rhel5.5z version of the driver in the
> rhel5.6 environment where you are seeing this problem?
> 

Running lpfc driver v8.2.0.63.3p (from the 5.5.z kernel 2.6.18-194.26.1.el5) on 5.6 (Snap3 kernel 2.6.18-233.el5) works fine. No hangs seen yet during IO with faults for more than 6 hours now.

So that should confirm the suspicion that the 5.6 lpfc driver 8.2.0.87 has indeed regressed.

--- Additional comment from marting on 2010-12-20 04:36:53 EST ---

(In reply to comment #22)
> Running lpfc driver v8.2.0.63.3p (from the 5.5.z kernel 2.6.18-194.26.1.el5) on
> 5.6 (Snap3 kernel 2.6.18-233.el5) works fine. No hangs seen yet during IO with
> faults for more than 6 hours now.
> 

The tests ran successfully for 48 hours now, without any issues. 

Just an FYI - the hang was reproducible with the latest lpfc driver v8.2.0.87.1p as well available in the 5.6 Snap5 kernel (2.6.18-236.el5).

So how do we proceed ahead for this bug, since this obviously is a blocker for 5.6 Emulex FC?

--- Additional comment from rajashekhar.a on 2010-12-22 14:13:00 EST ---

The dumps are available at - ftp://ftp.netapp.com/pub/home/rajs/pub/660278_dumps.tar.gz (626MB)

The compressed file has the serial logs, syslog messages with lpfc verbosity set to 0x1007 and the vmcore (expands to more than 1.6G).

--- Additional comment from coughlan on 2010-12-22 15:39:24 EST ---

(In reply to comment #23)

> So how do we proceed ahead for this bug, since this obviously is a blocker for
> 5.6 Emulex FC?

It will have to go in 5.6.z.

Emulex, please make this a priority, so we can turn it around quickly.

--- Additional comment from dick.kennedy on 2010-12-22 16:03:13 EST ---

Martin,
I have been looking at the dumps that you provided. The two lpfc worker threads are just sleeping on their events that 
Wake them up when we get a new IO from the mid-layer or an interrupt from the device. From the drivers point of view it is just waiting for something to do. If the mid-layer had IO outstanding with the driver and the driver was not completing it then I would expect that it would be send aborts down for those IO's. I do not see any threads trying to obtain an lpfc related lock and the task structures for the 2 lpfc worker threads do not indicate that either thread is holding any locks.

I did notice these messages as you machine was coming up that have me concerned, has anyone at RH looked at these messages?

sd 1:0:1:39: alua: port group 00 state A supports ToUsNA
sd 0:0:1:37: alua: port group 00 state A supports ToUsNA
sd 1:0:1:37: alua: port group 00 state A supports ToUsNA
sd 1:0:0:17: alua: port group 02 state A supports ToUsNA
sd 0:0:0:17: alua: port group 02 state A supports ToUsNA

=============================================
[ INFO: possible recursive locking detected ]
2.6.18-233.el5debug #1
---------------------------------------------
lvm/863 is trying to acquire lock:
 (&md->io_lock){----}, at: [<ffffffff881949ad>] dm_request+0x41/0x130 [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<ffffffff881949ad>] dm_request+0x41/0x130 [dm_mod]

other info that might help us debug this:
1 lock held by lvm/863:
 #0:  (&md->io_lock){----}, at: [<ffffffff881949ad>] dm_request+0x41/0x130 [dm_mod]


If you could remake your initrd after adding "options lpfc lpfc_log_verbose=0x1007" to your /etc/modprobe.conf.
Then reboot.
After the reboot run the failover and then force the dump.


Thanks,
Dick

-----Original Message-----
From: George, Martin [Martin.George] 
Sent: Tuesday, December 21, 2010 7:06 AM
To: Papadimitriou, Vaios
Cc: Weber, Dan; Bakken, Barry; Wheeler, Greg; Kennedy, Dick; Tapp, Buddy; Veeraraghavan, Kugesh; Mann, Joseph
Subject: RE: Open Emulex bugs in RHEL

Vaios,

Do you have any updates on this?

Now we've hit the same hang on SLES10 SP4 Emulex FC as well (using lpfc
v8.2.0.85) - tracked under Novell bz#660763.

Thanks,
-Martin

-----Original Message-----
From: George, Martin 
Sent: Monday, December 06, 2010 8:31 PM
To: Vaios.Papadimitriou
Cc: Dan.Weber; Barry.Bakken;
Greg.Wheeler; Dick.Kennedy; Tapp, Buddy;
Veeraraghavan, Kugesh; joseph.mann
Subject: Re: Open Emulex bugs in RHEL


Hi Vaios,

On 11/15/2010 1:38 PM, Martin George wrote:
>
> On 11/11/2010 1:48 AM, Vaios.Papadimitriou wrote:
>> Martin,
>>
>> The last LPFC driver version we submitted to Red Hat for RHEL5.6 
>> in-box inclusion is 8.2.0.86. I found out today this driver version 
>> has already being accepted to the Red Hat z-stream.
>>
>> The new issue you found has been fixed in this later 8.2.0.86 driver 
>> version. Could you please retest w/ that?
>
> With this latest 8.2.0.86 driver, the host (root on dm-multipath 
> SANbooted) hangs during the 1st iteration of fabric faults itself. And

> this is seen consistently. Does look like there is some problem with 
> this latest driver. Attaching the sysrq dumps collected during the
hang.
>

Did you check the sysrq dumps that I had attached here for the hang? 
Seems I'm hitting the same hang now on RHEL 5.6 Snap3 kernel containing 
lpfc driver v8.2.0.87, as tracked in RH bz #660278.

Thanks,
-Martin
All four CPUS are running the Swapper when the dump is taking. 
From the last run time in the task structure for the lpfc_worker theads, both theads have run recently but are waiting for something to do at the time of the dump.

--- Additional comment from dick.kennedy on 2010-12-22 16:04:31 EST ---

I have been running 5.6 and I cannot reproduce this in our lab. I do not have netapp storage in our lab to test against so I have been using clariion luns. 

I have looked at the dumps and our lpfc_worker threads are just sleeping on the events waiting for something (irq or command from the mid-layer) to wake them up. We are not holding any locks and not trying to acquire any.

All 4 cpus active process is the swapper? 

-----Original Message-----
From: Tom Coughlan [coughlan]
Sent: Wednesday, December 22, 2010 3:42 PM
To: Kennedy, Dick
Cc: Barry, Laurie; revers
Subject: This one is pretty hot: Bug 660278 - [NetApp 5.7 Bug] RHEL 5.6 Emulex FC host hangs during fabric faults

https://bugzilla.redhat.com/show_bug.cgi?id=660278

It seems we are shipping an lpfc regression in 5.6. We'll need to get a fix in 5.6.z asap. 

Have you tested RHEL 5.6 extensively? Can you reproduce this? It would be good to know how widespread the problem will be. (We did some similar tests on some different storage hardware and we did not see it.)

Tom

--- Additional comment from dick.kennedy on 2010-12-23 11:01:18 EST ---

I was talking with the VMware driver guys and they just fixed your problem.
The saturn firmware that you are using will not forward els frames to the driver when NPIV is enabled. The difference between the 8.2.0.63.3p and the 8.2.0.85 driver is that NPIV in the driver is enabled by default. The firmware only fails when it is config'd for NPIV. 
Could you please try updating your lpe12002 firmware 2.00.A4.
Or you could remake the initrd with after setting lpfc_enable_npiv=0

--- Additional comment from dick.kennedy on 2010-12-23 12:36:47 EST ---

Sorry for the confusion, the lpe12002 fw 2.00.A4 is not available. You would have to go back to the previous release 1.11.a5.

--- Additional comment from revers on 2010-12-23 15:05:45 EST ---

(In reply to comment #29)
> Sorry for the confusion, the lpe12002 fw 2.00.A4 is not available. You would
> have to go back to the previous release 1.11.a5.

Dick,

Can you provide a release note for this issue and enter as a comment in this bz?

Rob

--- Additional comment from revers on 2010-12-23 15:08:07 EST ---


Netapp, please confirm comment 29

--- Additional comment from marting on 2011-01-04 07:27:54 EST ---

(In reply to comment #31)
> Netapp, please confirm comment 29

Disabling lpfc_enable_npiv seems to work. We have not hit any issues so far in our tests (running for more than 24 hours now) after disabling it.

Now that we know this causes problems, shouldn't this setting be disabled by default in 5.6 lpfc?

--- Additional comment from revers on 2011-01-04 13:33:42 EST ---

(In reply to comment #32)
> (In reply to comment #31)
> > Netapp, please confirm comment 29
> 
> Disabling lpfc_enable_npiv seems to work. We have not hit any issues so far in
> our tests (running for more than 24 hours now) after disabling it.
> 
> Now that we know this causes problems, shouldn't this setting be disabled by
> default in 5.6 lpfc?

Doubtful any more updates will be taken in rhel5.6 at this point.  GA is very close at hand.

How about a z-stream update to revert enabling npiv by default?  If this makes sense to Emulex, I will open a bug, where a 5.6z patch can be attached.  Richard, please indicate.

Thanks, Rob

--- Additional comment from vaios.papadimitriou on 2011-01-04 13:48:19 EST ---

No, we do not recommend disabling NPIV by default, as this issue applies __ONLY__ to the 8Gb/s LPe12K HBAs running w/ FW rev 2.00a3. This issue will be fixed w/ the next GA firmware release.

Instead, we recommend adding the following release note:

...
It has been found that possible issues may be encountered when the RHEL5.5 or RHEL5.6 distribution kernel is used with the in-box LPFC driver, on a system with 8Gb/s LPe1200x HBAs and firmware version 2.00a3. 
These issues could be encountered during fabric faults with multipathing, and visible symptoms include loss of LUNs and/or FC host hang.

If these issues are encountered it is recommended to do one of the following:

1.	Downgrade the firmware revision of the 8Gb/s LPe1200x HBA to revision 1.11a5:
http://www.emulex.com/files/downloads/hardware/lpe1200x/prev_fw.html

Or,

2.	Modify the LPFC driver’s “lpfc_enable_npiv” module parameter to zero:

lpfc_enable_npiv=0

You can accomplish this by doing either of the following:
•	When loading the LPFC driver from the initrd image (that is at system boot time), add the following line in the /etc/modprobe.conf file:

      options lpfc_enable_npiv=0

and then re-build the initrd image.
•	When loading the LPFC driver dynamically, include the “lpfc_enable_npiv=0” option in the insmod or modprobe command line.

For additional information on how to set the LPFC driver module parameters, refer to the Emulex Drivers for Linux User Manual.
…

-Vaios-

--- Additional comment from revers on 2011-01-04 17:09:54 EST ---

Ryan,

Any chance the release note here could get into rhel5.6?  If not, what is the alternative?

Rob

--- Additional comment from revers on 2011-01-04 17:09:55 EST ---


    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
It has been found that possible issues may be encountered when the RHEL5.5 or
RHEL5.6 distribution kernel is used with the in-box LPFC driver, on a system
with 8Gb/s LPe1200x HBAs and firmware version 2.00a3. 
These issues could be encountered during fabric faults with multipathing, and
visible symptoms include loss of LUNs and/or FC host hang.

If these issues are encountered it is recommended to do one of the following:

1. Downgrade the firmware revision of the 8Gb/s LPe1200x HBA to revision
1.11a5:
http://www.emulex.com/files/downloads/hardware/lpe1200x/prev_fw.html

Or,

2. Modify the LPFC driver’s “lpfc_enable_npiv” module parameter to zero:

lpfc_enable_npiv=0

You can accomplish this by doing either of the following:
• When loading the LPFC driver from the initrd image (that is at system boot
time), add the following line in the /etc/modprobe.conf file:

      options lpfc_enable_npiv=0

and then re-build the initrd image.
• When loading the LPFC driver dynamically, include the “lpfc_enable_npiv=0”
option in the insmod or modprobe command line.

For additional information on how to set the LPFC driver module parameters,
refer to the Emulex Drivers for Linux User Manual.

--- Additional comment from rlerch on 2011-01-04 21:24:28 EST ---


    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,30 +1,14 @@
-It has been found that possible issues may be encountered when the RHEL5.5 or
-RHEL5.6 distribution kernel is used with the in-box LPFC driver, on a system
-with 8Gb/s LPe1200x HBAs and firmware version 2.00a3. 
-These issues could be encountered during fabric faults with multipathing, and
-visible symptoms include loss of LUNs and/or FC host hang.
+Issues might be encountered on a system with 8Gb/s LPe1200x HBAs and firmware version 2.00a3 when the Red Hat Enterprise Linux 5.6 kernel is used with the in-box LPFC driver. Such issues include loss of LUNs and/or fiber channel host hangs during fabric faults with multipathing.
 
-If these issues are encountered it is recommended to do one of the following:
+To work around these issues, it is recommended to either:
+Downgrade the firmware revision of the 8Gb/s LPe1200x HBA to revision 1.11a5, or
+Modify the LPFC driver’s lpfc_enable_npiv module parameter to zero.
+When loading the LPFC driver from the initrd image (i.e. at system boot time), add the line
 
-1. Downgrade the firmware revision of the 8Gb/s LPe1200x HBA to revision
-1.11a5:
-http://www.emulex.com/files/downloads/hardware/lpe1200x/prev_fw.html
+options lpfc_enable_npiv=0
 
-Or,
+to /etc/modprobe.conf and re-build the initrd image.
 
-2. Modify the LPFC driver’s “lpfc_enable_npiv” module parameter to zero:
+When loading the LPFC driver dynamically, include the lpfc_enable_npiv=0 option in the insmod or modprobe command line.
 
-lpfc_enable_npiv=0
+For additional information on how to set the LPFC driver module parameters, refer to the Emulex Drivers for Linux User Manual.-
-You can accomplish this by doing either of the following:
-• When loading the LPFC driver from the initrd image (that is at system boot
-time), add the following line in the /etc/modprobe.conf file:
-
-      options lpfc_enable_npiv=0
-
-and then re-build the initrd image.
-• When loading the LPFC driver dynamically, include the “lpfc_enable_npiv=0”
-option in the insmod or modprobe command line.
-
-For additional information on how to set the LPFC driver module parameters,
-refer to the Emulex Drivers for Linux User Manual.

--- Additional comment from marting on 2011-01-13 07:25:23 EST ---

(In reply to comment #36)
> When loading the LPFC driver from the initrd image (that is at system boot
> time), add the following line in the /etc/modprobe.conf file:
> 
>       options lpfc_enable_npiv=0
> 

Shouldn't this be 'options lpfc lpfc_enable_npiv=0' ?

--- Additional comment from dick.kennedy on 2011-01-13 08:20:19 EST ---

Yes. options lpfc lpfc_enable_npiv=0

--- Additional comment from andriusb on 2011-01-13 09:13:25 EST ---


    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -5,7 +5,7 @@
 Modify the LPFC driver’s lpfc_enable_npiv module parameter to zero.
 When loading the LPFC driver from the initrd image (i.e. at system boot time), add the line
 
-options lpfc_enable_npiv=0
+options lpfc lpfc_enable_npiv=0
 
 to /etc/modprobe.conf and re-build the initrd image.

--- Additional comment from revers on 2011-01-26 12:50:20 EST ---

(In reply to comment #39)
> (In reply to comment #36)
> > When loading the LPFC driver from the initrd image (that is at system boot
> > time), add the following line in the /etc/modprobe.conf file:
> > 
> >       options lpfc_enable_npiv=0
> > 
> 
> Shouldn't this be 'options lpfc lpfc_enable_npiv=0' ?

Martin,

Will you be able to verify updated firmware when it is available w/ npiv on?

--- Additional comment from marting on 2011-02-01 05:23:49 EST ---

(In reply to comment #42)
> Martin,
> 
> Will you be able to verify updated firmware when it is available w/ npiv on?

Yes, I'll do that. Meanwhile, can someone specify the firmware version which would contain the fix?

--- Additional comment from pm-rhel on 2011-02-01 12:03:13 EST ---

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

--- Additional comment from revers on 2011-02-03 08:47:28 EST ---

(In reply to comment #43)
> (In reply to comment #42)
> > Martin,
> > 
> > Will you be able to verify updated firmware when it is available w/ npiv on?
> 
> Yes, I'll do that. Meanwhile, can someone specify the firmware version which
> would contain the fix?

Hi Vaios,

Do we have an updated fw version to address the NPIV issue with 8Gb/s LPe1200x HBAs and firmware version 2.00a3.

Rob

--- Additional comment from vaios.papadimitriou on 2011-02-03 13:53:02 EST ---

A test build firmware (rev 2.00x4) was sent today to Netapp for evaluation.

Comment 3 Vaios Papadimitriou 2011-03-10 18:56:23 UTC
Netapp has notified Emulex that they verified the test build firmware 2.00x4
has fixed the issue they were seeing.

This fix will appear in the next FW version 2.00a4 that will be released by Emulex.

Comment 4 Vaios Papadimitriou 2011-04-06 16:21:43 UTC
The Emulex Firmware version 2.00a4 was released, and is available on the Emulex website.

I request that we close this BZ.

-Vaios-

Comment 5 Rob Evers 2011-04-06 19:19:24 UTC
(In reply to comment #4)
> The Emulex Firmware version 2.00a4 was released, and is available on the Emulex
> website.
> 
> I request that we close this BZ.
> 
> -Vaios-

Thanks for the update Vaios.

Is Netapp ready to close this or is more testing required?

Rob

Comment 6 Rajashekhar M A 2011-04-12 11:53:29 UTC
Thanks. We have verified the new firmware. This bugzilla can be closed.


Note You need to log in before you can comment on or make changes to this bug.