355961 – [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP path faults

Bug 355961 - [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP path faults

Summary: [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP p...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Corey Marthaler
Docs Contact:
URL:
Whiteboard:	GSSApproved
Duplicates (2):	431119 439030 (view as bug list)
Depends On:
Blocks:	217208 RHEL5u2_relnotes 428338 431947 438150
TreeView+	depends on / blocked

Reported:	2007-10-28 18:48 UTC by Martin George
Modified:	2018-10-19 22:05 UTC (History)
CC List:	27 users (show)
Fixed In Version:	RHBA-2008-0337
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 15:35:38 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Config and sysrq dumps of the host (86.15 KB, application/zip) 2007-10-28 18:59 UTC, Martin George	no flags	Details
Path fail testing script using sysfs to offline/online paths (651 bytes, text/plain) 2007-11-05 17:59 UTC, Kiyoshi Ueda	no flags	Details
Sysrq dumps + /var/log/messages (128.03 KB, application/zip) 2007-11-06 19:33 UTC, Martin George	no flags	Details
Syrq dumps + messages with lpfc verbose logging (128.39 KB, application/zip) 2007-11-07 12:32 UTC, Martin George	no flags	Details
New dumps as requested (291.68 KB, application/zip) 2007-11-08 07:52 UTC, Martin George	no flags	Details
New dumps (225.01 KB, application/zip) 2007-11-09 15:42 UTC, Martin George	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0337	0	normal	SHIPPED_LIVE	device-mapper-multipath bug fix and enhancement update	2008-05-20 13:27:39 UTC

Description Martin George 2007-10-28 18:48:13 UTC

Description of problem:
Host setup: RHEL 5.1 RC1 (2.6.18-52.el5) using Emulex LPe11002 HBA cards &
8.1.10.9 driver. Along with the root lun, 2 additional data luns are mapped to
the RHEL host. Each lun has 4 FCP paths each - 2 primary & 2 secondary paths.

Now fault injections are performed where the FCP paths alternately go offline
and then online as follows:

1) First the primary FCP paths are made offline. After an interval of 10
minutes, the primary paths are brought back to online status.

2) After the next 10 minutes, the secondary paths are then made offline. These
paths are made online again after 10 minutes. 

The above cycle is repeated in a loop i.e. at any given point of time, either
primary or secondary paths are made available for each lun on the RHEL host.

Within a couple of hours of these iterations, the host becomes unresponsive and
freezes. The freeze is always reproducible for the above scenario. The only way
out is then to hard boot the machine. This happens for both IO & non IO runs on
the data luns. 

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-12.el5

How reproducible:
Always.

Steps to Reproduce:
1. Along with the root lun, map 2 data luns to the RHEL 5.1 root device
multipathed host. Each lun has 4 FCP paths each - 2 primary & 2 secondary paths.
2. Now perform fault injections on the FCP paths as described above. 
  
Actual results:
The host freezes within a couple of hours of the path fault injections
necessitating a hard boot.

Expected results:
The host should survive the above fault injections.

Additional info:
1) A normal RHEL 5.1 (non SANbooted root device multipathed) host survives path
fault injections even for longetivity runs of 72 hours i.e. the freeze is seen
only on root device multipathed hosts during path faults.

2) We also tried tweaking the disk timeout values of the underlying SCSI paths
as well as the lpfc_devloss_tmo values of the Emulex driver. But the freeze was
still reproducible. The freeze is also seen irrespective of using the default
mpath_prio_netapp prio callout or the modified mpath_prio_ontap callout.

Comment 1 Martin George 2007-10-28 18:59:58 UTC

Created attachment 240921 [details]
Config and sysrq dumps of the host

The attachment contains 4 files:

1) Config.txt - Configuration file of the RHEL 5.1 root device multipathed
host.

2) RHEL5.1-WithoutIO-SANbootfreeze.txt - Sysrq dumps of the host during the
freeze for a non IO scenario.

3) RHEL5.1-WithIO-SANbootfreeze.txt - Sysrq dumps of the host during the freeze
  for a IO scenario on the data luns.

4) RHEL5.1-Debug-SANboot-1.txt - Sysrq dumps of the host during the freeze,
again for a non IO scenario. Here, the debug kernel was used
(2.6.18-52.el5debug).

Comment 3 RHEL Program Management 2007-11-02 16:04:35 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Andrius Benokraitis 2007-11-05 14:35:39 UTC

NetApp states this is blocking SAN boot support on RHEL 5.1.

Comment 5 Andrius Benokraitis 2007-11-05 14:38:22 UTC

Adding Emulex - just as FYI for them...

Comment 6 laurie barry 2007-11-05 14:59:57 UTC

Emulex is reviewing this bugzilla now.

Comment 7 Kiyoshi Ueda 2007-11-05 17:54:04 UTC

I tried a similar test on the following environment, but I can't
reproduce the problem.
  o HBA          : QLA2340(2Gbps * 1port) * 2
  o Storage      : NEC iStorage (2Gbps * 2port)
  o FC-Connection: FC-AL (No FC-Switch between HBAs and the storage)
  o Configurations:
    - 3 LUNs from the storage (each LUN has 2 paths)
    - Failover configuration (1 path in each priority-group)
    - Directio path checker
  o Fault injection:
    - Disk offline/online using sysfs (like the attached script)
    - Remove/restore FC cables alternately manually

Accoding to the console log attached to the comment#1, path checking
by multipathd looks stopped from a certain point of time.
If it is true, the paths would not be reinstated and all paths became
down eventually.
Are there any related logs in /var/log/messages?

Also, I think answers to the following questions would help isolating
the problem.
  o What was the path status from dm's view at each point of time?
    You can check it with "dmsetup status".
    If paths in one priority group are failed before you inject a fault
    to paths of the other priority group, all paths will be down.
  o What fault injection method was used?
    Is it NetApp specific or generic?
    I attached a test script using sysfs to offline/online the paths.
    Could you try this to see if the problem can be reproduced?
  o Does it happen if other path checkers like "tur" or "readsector0"
    are used?
  o Does it happen if the 2 data LUNs aren't used?
    (Create only 1 multipath device for the root volume)
  o Does it happen if 2 paths (1 path in each priority-group) are used?
  o Does it happen if other HBAs like QLogic are used?

Comment 8 Kiyoshi Ueda 2007-11-05 17:59:08 UTC

Created attachment 248471 [details]
Path fail testing script using sysfs to offline/online paths

You need to modify some variables in the script for your environment.

Comment 9 Jamie Wellnitz 2007-11-05 19:00:02 UTC

The RHEL5.1-Debug-SANboot-1.txt file shows "possible recursive locking" in the
devicemapper code:

[ INFO: possible recursive locking detected ]
2.6.18-52.el5debug #1
---------------------------------------------
lvm/457 is trying to acquire lock:
 (&md->io_lock){----}, at: [<f88b078f>] while...
 dm_request+0x15/0xe6 [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<f88b078f>] dm_request+0x15/0xe6 [dm_mod]

other info that might help us debug this:
1 lock held by lvm/457:
 #0:  (&md->io_lock){----}, at: [<f88b078f>] dm_request+0x15/0xe6 [dm_mod]

stack backtrace:
 [<c043be58>] __lock_acquire+0x70c/0x922
 [<c043c5bb>] lock_acquire+0x4b/0x68
 [<f88b078f>] dm_request+0x15/0xe6 [dm_mod]

Could this indicate a real issue in dm?

Also, is nmi_watchdog enabled?

Can you invoke kdb and get a backtrace of any uninterruptible processes a few
minutes apart to see if they stay in the same place?  (Or, without kdb, invoke
the sysrq a few minutes apart.)

Comment 10 Martin George 2007-11-06 13:33:25 UTC

Please find my replies below:

o What was the path status from dm's view at each point of time?
    You can check it with "dmsetup status".
    If paths in one priority group are failed before you inject a fault
    to paths of the other priority group, all paths will be down.

-- We did run the "dmsetup status --target=multipath -v" in a loop. It properly
displayed the corresponding path status during the faults. But during faults,
SCSI error messages cluttered the console screen making it difficult to check
the dmsetup status. And once the freeze is hit, it was impossible to detect the
dmsetup status as the host was not responsive. Also, the path faults were run in
such a manner that first paths in one priority group were offlined and then
onlined. Only after this was it repeated for the paths in the next priority
group i.e. either primary/secondary paths (or both) were available for each lun
at any given point of time. 

  o What fault injection method was used?
    Is it NetApp specific or generic?
    I attached a test script using sysfs to offline/online the paths.
    Could you try this to see if the problem can be reproduced?

-- This is Netapp specific in the sense that a dual clustered Netapp controller
setup was used as the target. Each controller head has 2 target ports thereby
totalling 4 target ports altogether i.e. for each lun on any one controller
head, 4 FCP paths were available on the host - 2 primary (through the local
head) & 2 secondary (through the partner head). Now faults are run in such a
manner that when one controller head goes down, the partner head takes over and
then vice versa. This is a repeated in a loop. On the host, this would
correspond to paths getting offlined and then onlined as described above (in the
first point).

We did take a Finisar trace during the freeze. On analyzing them, we could see
the whole sequence of RSCNs, GID_FTs, PLOGIs, PRLIs which is normal during
faults. But subsequently, the initiator ports never proceed beyond the REPORT
LUNs commands for the respective luns during these fault injections. The target
properly responds to the initiator commands, but it is the initiator ports that
remain idle after receiving a GOOD STATUS reply to the corresponding REPORT LUNs
commands. So the target can be ruled out in this case.

We also enabled the HBA error logging, SCSI error logging & the multipathd
logging (by running multipathd -v4) and even used the debug kernel - but to no
avail. No relevant messages were seen in the /var/log/messages (after rebooting
the host) or serial console during the freeze.

  o Does it happen if other path checkers like "tur" or "readsector0"
    are used?

-- Yes.

  o Does it happen if the 2 data LUNs aren't used?
    (Create only 1 multipath device for the root volume)

-- Not tried this yet.

  o Does it happen if 2 paths (1 path in each priority-group) are used?

-- Not tried this yet.

  o Does it happen if other HBAs like QLogic are used?

-- Not tried this yet.

Could this indicate a real issue in dm? Also, is nmi_watchdog enabled?

-- No, nmi watchdog is not enabled.

Can you invoke kdb and get a backtrace of any uninterruptible processes a few
minutes apart to see if they stay in the same place?  (Or, without kdb, invoke
the sysrq a few minutes apart.)

-- I think the "recursive locking" message was displayed during the host reboot
and not during the freeze. Anyways I can collect successive sysrq dumps (which
are a few minutes apart) during the freeze if thats what you want.

Comment 11 Kiyoshi Ueda 2007-11-06 16:38:51 UTC

Hi Emulex,

> We did take a Finisar trace during the freeze. On analyzing them,
> we could see the whole sequence of RSCNs, GID_FTs, PLOGIs, PRLIs
> which is normal during faults. But subsequently, the initiator ports
> never proceed beyond the REPORT LUNs commands for the respective luns
> during these fault injections. The target properly responds to 
> the initiator commands, but it is the initiator ports that remain idle
> after receiving a GOOD STATUS reply to the corresponding REPORT LUNs
> commands. So the target can be ruled out in this case.

Do you think that it means the device driver is stalling and
the first suspect?
Or do you think that it is a result of other components' fault
or something?

Comment 12 Kiyoshi Ueda 2007-11-06 18:06:17 UTC

>   o What was the path status from dm's view at each point of time?
>     You can check it with "dmsetup status".
>     If paths in one priority group are failed before you inject a fault
>     to paths of the other priority group, all paths will be down.
>
> -- We did run the "dmsetup status --target=multipath -v" in a
>    loop. It properly displayed the corresponding path status during
>    the faults. But during faults, SCSI error messages cluttered the
>    console screen making it difficult to check the dmsetup
>    status. And once the freeze is hit, it was impossible to detect
>    the dmsetup status as the host was not responsive. Also, the path
>    faults were run in such a manner that first paths in one priority
>    group were offlined and then onlined. Only after this was it
>    repeated for the paths in the next priority group i.e. either
>    primary/secondary paths (or both) were available for each lun at
>    any given point of time.

I meant that what was the path status BEFORE injecting a fault.

In this testing loop, all paths should be active from dm's view
just before injecting a fault.
But if something wrong happened in multipathd, the onlined paths
might not be activated yet when the paths in the next pg-group
were offlined.

Is it possible for you to check 'dmsetup status' on the testing host
before every fault injection?
Also, is it possible to stop injecting fault if either of the paths
are marked as 'F' in the output of dmsetup?
Then, you would see the crucial state where all paths should be
available but dm sees some of them failed.

Anyway, could you attach all console logs and /var/log/messages
to this bugzilla if possible?
Those might include helpful information to isolate that the cause
is in multipathd, the device driver or other kernel components.


Also, results of the trials in comment#7 (especially the result of
sysfs fault injection test the attached in comment#8) could help
investigation very much.

Comment 13 Jamie Wellnitz 2007-11-06 18:11:40 UTC

Add Mukesh @Emulex to CC

Comment 14 Martin George 2007-11-06 18:32:20 UTC

What I meant was that the issue seemed purely a host related one from the
Finisar trace. But identifying which layer in the storage stack triggered the
freeze is the tough part. 

As requested, I'll try checking the dmsetup status on the host before every
fault. Will also attach the relevant console logs & /var/log/messages for this
scenario.

Comment 15 mukesh kumar 2007-11-06 18:51:35 UTC

>>Hi Emulex,
>>Do you think that it means the device driver is stalling and
>>the first suspect?
>>Or do you think that it is a result of other components' fault
>>or something?

We don’t have enough data to suspect device driver at this point. Device 
driver is completing Fibre channel discovery without any error. 
There is no device driver thread in non interruptible state in all sysrq task 
dump. 

Additional test data ( back trace of any uninterruptible processes a few
minutes apart )on DEBUG KERNEL with NMI WATCHDOG enable will help.


>>-- I think the "recursive locking" message was displayed during the host 
>>reboot and not during the freeze. 

It's worth debugging why we see this "recursive locking" message during system 
boot.

Comment 16 Kiyoshi Ueda 2007-11-06 19:09:32 UTC

> >>-- I think the "recursive locking" message was displayed during the host 
> >>reboot and not during the freeze. 
> 
> It's worth debugging why we see this "recursive locking" message during system 
> boot.

It's a known problem.

http://marc.info/?l=dm-devel&m=116322663022361&w=2
http://marc.info/?l=dm-devel&m=116379258818712&w=2

Comment 17 Martin George 2007-11-06 19:33:51 UTC

Created attachment 249531 [details]
Sysrq dumps + /var/log/messages

Attachment contains the following:

1) Sysrq dumps - After the freeze, the process state was collected twice 10
minutes apart (not yet enabled nmi_watchdog).

2) /var/log/messages - Collected the messages file after rebooting the host.
One can see the last multipathd message at 21:48:25 due to the path faults
before the freeze. The subsequent messages correspond to the host reboot.

Comment 18 mukesh kumar 2007-11-07 00:40:59 UTC

System is in hung state because root device is loosing all path. HBA driver is 
returning DID_ERROR on active path ( 3:0:1:0 and 4:0:0:0 ) while other two 
paths remain dead. We need additional information to figure out why lpfc 
driver is  returning DID_ERROR. 

Please run test with NO IO and lpfc driver log  verbose set to 0x40.

you can turn on lpfc driver log level by adding following line 
to  /etc/modprobe.conf
options lpfc lpfc_log_verbose=0x40

Comment 19 Martin George 2007-11-07 12:32:51 UTC

Created attachment 250041 [details]
Syrq dumps + messages with lpfc verbose logging

Attachment contains the following:

1) rhel5.1-lpfc-verbose-1.TXT - Sysrq dumps of the host(with debug kernel) with
no IO run. The process state was collected twice 10 minutes apart after the
freeze. The lpfc log verbose was set to 0x40 and nmi_watchdog enabled.

2) messages-new - The /var/log/messages file after rebooting the host following
the freeze.

Comment 20 mukesh kumar 2007-11-07 17:22:34 UTC

Console log and /var/log/messages files are missing some crucial information.
I am not seeing any SCSI error messages on console. Initial driver loading 
messages are  missing from /var/log/message file. These messages are very 
important to debug this issue.  Previous log files did have all those 
information, look like console log level get changed.

Can you please re- run test with

1. console log level set to 8
  echo 8 > /proc/sys/kenrel/printk
2. lpfc driver verbose set to 0x40
3. NO I/O
4. clean up old /var/log/messages
.

Comment 21 Martin George 2007-11-08 07:52:58 UTC

Created attachment 251291 [details]
New dumps as requested

As requested, I have attached the new logs as follows:

1) rhel5.1-lpfc-verbose-2.TXT - Sysrq dumps (10 minutes apart)of the debug
kernel after the freeze. This is for the non IO scenario with lpfc log verbose
= 0x40, console log level = 8 and nmi_watchdog enabled. 

2) messages-debug-2 - /var/log/messages for the above scenario.

Comment 22 Kiyoshi Ueda 2007-11-08 14:55:50 UTC

As Mukesh commented, all paths for root filesystem down
although it isn't expected to happen in the test scenario.

As a result, one multipathd thread (PID=2812 in RHEL5.1-debug-1.TXT)
is stalling on exec() system call, waiting for inode write out for
updated access time.
The stalling thread should be the path checker (checkerloop()),
which is trying to execute the priority callout, mpath_prio_netapp.
So onlined paths aren't activated any more, since only the path
checker activates onlined paths in a system.

Not using priority callouts, or specifying "noatime" option of mount
(if the callout related files are on page cache) would work around
the stall.

  # mount -o remount,nodiratime,noatime /

Comment 23 mukesh kumar 2007-11-08 22:07:32 UTC

We are loosing all path to root device. Its look like command on active path 
is timing out and mid layer is aborting those command. Emulex driver returns 
aborted command with DID_ERROR. Console log indicates Device mapper make a path
down when it receive a DID_ERROR. 

Kiyoshi what is DM behavior when it gets a command with DID_ERROR ?

Martin i would like you to run test one more time with changing lpfc driver 
log level to 0x43.

Comment 24 Kiyoshi Ueda 2007-11-09 14:27:26 UTC

Re: Comment#23

dm-multipath marks the path down, and doesn't use it
until multipathd activates it again.
(But multipathd gets stalled when the all path down happened.
 So no paths are activated any more.)

Comment 25 mukesh kumar 2007-11-09 14:51:15 UTC

A command which fails with DID_ERROR should be retried. 

More observation 
  - Command on active path get aborted.
  - Commands on same path returns UNIT ATTENTION attention and ASC ASQ as
    29 00. It means bus reset is happening. 
  A command can be aborted in following case 
  1.  command is  timing  out.
  2.  bus reset  issued.
  3.  Lun reset issued .

  Since there is no I/O going on system its less likely command is timing 
  out.

 is any application issuing bus reset ?  

Out put from rhel5.1-lpfc-verbose-2.TXT ( First word is line number )
Indicates path 3-0-1-0 is seeing command aborted first and then next command
fails with Unit attention and ASC ASCQ as 29 00 which is bus reset / power on 
reset.

 68920 lpfc 0000:02:00.0: 0:0729 FCP cmd x28 failed <1/0> status: x3 result: 
x16 Data: x76 x966
68921 lpfc 0000:02:00.0: 0:0710 Iodone <1/0> cmd e7d3f3c0, error x70000 SNS x0 
x0 Data: x0 x0
68924 lpfc 0000:02:00.0: 0:0749 SCSI Layer I/O Abort Request Status x2002 ID 1 
LUN 0 snum 0xee97
3-0-1-0 getting abort

68925 lpfc 0000:02:00.0: 0:0729 FCP cmd x0 failed <1/0> status: x1 result: x0 
Data: x6b x98a
68926 lpfc 0000:02:00.0: 0:0730 FCP command x0 failed: x2 SNS xf0000600 
x29000000 Data: x2 x0 x16 x0 x0
68933 lpfc 0000:02:00.0: 0:0710 Iodone <1/0> cmd e7d3f3c0, error x2 SNS x600f0 
x29 Data: x0 x0

BUS RESET ON lun 3-0-1-0

Comment 26 Martin George 2007-11-09 15:42:07 UTC

Created attachment 252931 [details]
New dumps

Attachment contains the following:

1) rhel5.1-lpfc-0x43.TXT - Syrq dumps (10 minutes apart) of the debug kernel
after the freeze. This if for the non IO scenario with lpfc log verbose = 0x43,
console log level = 8 & nmi_watchdog enabled.

2) messages-debug-3 - /var/log/messages for the above scenario.

Comment 27 Kiyoshi Ueda 2007-11-09 16:03:03 UTC

Re: Comment#25

> A command which fails with DID_ERROR should be retried. 

Current kernel doesn't propagate the error code to device-mapper layer.
    https://bugzilla.redhat.com/show_bug.cgi?id=168536
So dm-multipath doesn't retry any errors using the same path.

Is the DID_ERROR retried by SCSI mid layer if dm-multipath doesn't
set FAILFAST?
(related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=304521)

> More observation 
>   - Command on active path get aborted.
>   - Commands on same path returns UNIT ATTENTION attention and ASC ASQ as
>     29 00. It means bus reset is happening. 
>   A command can be aborted in following case 
>   1.  command is  timing  out.
>   2.  bus reset  issued.
>   3.  Lun reset issued .
> 
>   Since there is no I/O going on system its less likely command is timing 
>   out.

multipathd is submitting I/Os periodically to check paths.
(In this test case, multipathd uses direct I/O for it.)

Comment 28 Martin George 2007-11-09 18:03:37 UTC

I have a few queries on the above:

1) Why is the freeze seen only on SANbooted root device dm-multipathed hosts
during fault injections - and that too for a non IO scenario? Apparently the
same non SANbooted host successfully survives these fault injections even for
intensive IO longevity runs on the data luns (of 72 hours & more).

2) Curiously, the SANbooted host without data luns mapped (i.e. only with the
root lun) survives these fault injections i.e. the freeze is seen only on a
SANbooted host with additional data luns mapped to it during fault injections. Why?

3) The fault injection scripts are automated and perform offlining/onlining of
FCP paths every 10 minutes. If the non availability of paths to the root lun is
the cause of this issue, why does the freeze always set in only after 2-3 hours
of fault iterations (and not in the beginning)?

4) As described before, the FC traces (I can share them if requested) reveal
that the host initiators remain idle after the successful reponse to the REPORT
LUNs commands during the freeze. Why don't the initiators go ahead with
subsequent SCSI queries like READ CAPACITY etc., for the above scenario (which
is what occurs in a normal non SANbooted environment)? Apparently they remain
idle despite the target properly responding to the corresponding initiator
queries triggering the freeze.

Comment 29 mukesh kumar 2007-11-09 20:55:43 UTC

Here is what i found out 
- During testing we are loosing all path to lun occasionally. 
  /var/log/messages >> 
    Nov 9 17:33:37 rhel5-1-rc1 multipathd: mpath2: remaining active paths: 0 
- system hang when it loose all path to root device 
- Active luns are failing due to two reason 
    Command is aborted by scsi mid layer because its timing out.
    We are seeing Unit attention with 29 00 which indicates target is    
    restarting.
    Target may be recovering from some error or receive a bus reset. 
    CONSOLE LOG >> 
    44445 lpfc 0000:02:00.0: 0:0729 FCP cmd x2a failed <1/0> status: x3 
     result: x16 Data: x208 x8eb^M     
    44446 lpfc 0000:02:00.0: 0:0710 Iodone <1/0> cmd f0b3dce0, error x70000
      SNS x0 x0 Data: x0 x0^M   
    44447 lpfc 0000:02:00.0: 0:0749 SCSI Layer I/O Abort Request 
     Status        ID 1 LUN 0 snum 0x81a7^M
   - Command is failing with sense key 2 ( NOT READY ) and ASC ASCQ 40 10 . 
     CONSOLE LOG >> 
    41782 lpfc 0000:02:00.1: 1:0729 FCP cmd x28 failed <1/2> status: x1 
      result: x1000 Data: x8a x957^M
    41783 lpfc 0000:02:00.1: 1:0730 FCP command x28 failed: x2 SNS xf0000200 
       x4010000 Data: xa x1000 x16 x0 x0^M
    41785 lpfc 0000:02:00.1: 1:0710 Iodone <1/2> cmd f09b472c, error x2 SNS 
      x200f0 x104 Data: x0 x1000^M
    Second reason make me think we are seeing 29 00 because target is 
    going through some error recovery. 

  - Even system seems to hung, EMULEX  driver is processing RSCN for failed 
target when it come back. We do see REPORT lun is completing fine but no 
further activity. The midlayer scans, sending the Report Lun. It's implying 
that the report luns showed nothing different from what the midlayer already 
sees as present and implies the scsi status of the devices are still good too -
 thus it doesn't send any further i/o as it has already done that.
  CONSOLE LOG>>>
 45066 lpfc 0000:02:00.0: 0:0212 DSM out state 6 on NPort x10e00 Data: x7

From Emulex driver point of view we are handling every thing correctly.
I don't see any DM activity after that.

Comment 30 mukesh kumar 2007-11-09 22:12:47 UTC

To answer Martin questions

Q 1 and 2 

MUKESH>> As Kiyoshi suggested in his comment 22 DM can cause system hang when 
there  is no path to root device. I think he will explain better why this 
happens only with root device.

Q 3 3) The fault injection scripts are automated and perform 
offlining/onlining of FCP paths every 10 minutes. If the non availability of 
paths to the root lun is the cause of this issue, why does the freeze always 
set in only after 2-3 hours of fault iterations (and not in the beginning)?

MUKESH>> It takes that long to fail active path to root device. It may be 
because target is going into some kind of  error recovery mode after that 
long. As i mentioned in my previous comment ( comment 29 ), active path fail 
because 
  - its timing out and aborted by mid layer
  - Target is reporting NOT READY with ASC ASQ 40 10 .

  I am curious to know why target is reporting following errors and what that 
   mean.
 - Command is failing with sense key 2 ( NOT READY ) and ASC ASCQ 40 10 . 
     CONSOLE LOG >> 
    41782 lpfc 0000:02:00.1: 1:0729 FCP cmd x28 failed <1/2> status: x1 
      result: x1000 Data: x8a x957^M
    41783 lpfc 0000:02:00.1: 1:0730 FCP command x28 failed: x2 SNS xf0000200 
       x4010000 Data: xa x1000 x16 x0 x0^M
    41785 lpfc 0000:02:00.1: 1:0710 Iodone <1/2> cmd f09b472c, error x2 SNS 
      x200f0 x104 Data: x0 x1000^M


Q 4 As described before, the FC traces (I can share them if requested) reveal
that the host initiators remain idle after the successful reponse to the REPORT
LUNs commands during the freeze. Why don't the initiators go ahead with
subsequent SCSI queries like READ CAPACITY etc., for the above scenario (which
is what occurs in a normal non SANbooted environment)? Apparently they remain
idle despite the target properly responding to the corresponding initiator
queries triggering the freeze. 

MUKESH>>  Please see my comment (COMMENT #29 last bullet item ).
 DM send following 3 commands after report luns on each lun after Emulex 
driver discover remote ports. 
28h READ
c0h VENDOR SPECIFIC
12h INQUIRY.

Since DM seems to be in hung state we don't see any command after report lun.

Comment 31 Kiyoshi Ueda 2007-11-10 01:09:13 UTC

Re: Comment#28

> 1) Why is the freeze seen only on SANbooted root device
> dm-multipathed hosts during fault injections - and that too for a
> non IO scenario? Apparently the same non SANbooted host successfully
> survives these fault injections even for intensive IO longevity runs
> on the data luns (of 72 hours & more).

I assume the "non SANbooted" means dm-multipath is not used for root
device.  If it's incorrect, please let me know.

The mpath_prio_netapp which is used by multipathd is on root device.
So if root device is on dm-multipath and all paths down, multipathd
is stalled when trying to get the mpath_prio_netapp.
However if root device isn't on dm-multipath, multipathd can get it
and continue to work.  So even if all paths of the data luns down
temporarily, those paths are activated by multipathd when they are
onlined again.
I/O on the data luns is irrelevant to the freeze.


> 2) Curiously, the SANbooted host without data luns mapped (i.e. only
> with the root lun) survives these fault injections i.e. the freeze
> is seen only on a SANbooted host with additional data luns mapped to
> it during fault injections. Why?

I'm not sure, but I guess no-path situation (error return for active
paths from the device driver) doesn't happen in that case.
(But if it is true, why does the device driver detect errors
 only when maps for data luns exist?)

Anyway, if you attach the console log and /var/log/messages of that test,
I'll check.


> 3) The fault injection scripts are automated and perform
> offlining/onlining of FCP paths every 10 minutes. If the non
> availability of paths to the root lun is the cause of this issue,
> why does the freeze always set in only after 2-3 hours of fault
> iterations (and not in the beginning)?

It's a timing issue.
The freeze doesn't always happen when all paths for root device down.

The behavior of multipathd is below:
    for (all paths) {
        1. check path
        2. execute the priority callout, if the path is up
           (doesn't execute it if the path is down)
    }
So when all paths down before the path checking and multipathd detects
the path down, the freeze doesn't happen.
But when all paths down after the path checking before executing
priority callout, the freeze happens.

I'm not sure why it always happens on almost same time-window.

Comment 32 Martin George 2007-11-12 09:26:18 UTC

By non SANbooted, I did mean the root device not mounted on a dm-multipath
partition. 

Actually you are right about there being no paths available to the root lun
during the fault injections. But this is only for a small window, and the target
does respond to basic SCSI commands like INQUIRY, REPORT LUNs etc. during this
period.  So its never an issue with non SANbooted scenarios. But apparently,
that's not so for root device dm-multipath SANboot scenarios as per your
explanation. 

I saw your proposal regarding building all priority callouts into the multipathd
as library functions like path checkers. That does make sense and would
hopefully resolve this issue.

But in the current state, I don't think we can support the root device
dm-multipath feature on RHEL 5.1.

Comment 33 Kiyoshi Ueda 2007-11-12 19:09:56 UTC

Re: Comment#32

I agree with you.
Unavailability of all-paths must be avoided on RHEL5.1's root multipath
even if it's temporary and very small window, if the storage uses
priority callout.
That is very hard limitation and the information should be available
to customers.

Ben, or somebody from Red Hat, please provide such information via
knowledge base article or something.

The noatime mount in Comment#22 doesn't work if the callout isn't on
page cache.
Possible better workaround might be copying all callouts to ramdisk
and modify multipath.conf to use them.

Comment 34 Ben Marzinski 2007-12-11 21:23:33 UTC

Since I don't want to drastically change how the priority callouts work in an
update release, I'm not going to pull the libprio work into RHEL 5. Instead, I'm
going to add the ramdisk code back into multipath. I was pulled out because it
didn't work well with the pthread code in RHEL4.  However, pthreads work fine
with it in RHEL 5.

Comment 36 Ben Marzinski 2007-12-17 16:46:41 UTC

I've committed the fix for this.  The way it works, there should be no need to
change the configuration. multipathd now creates a ramfs cache for all of the
getuid and prio callouts that it uses. Once multipathd starts up, it doesn't
matter if  you lose access to the callout binaries, because multipathd has its
own copies.

There is only one minor restriction. Multipathd only adds callout programs to
it's cache on startup, not on reconfigure. This means, if you edit
/etc/multipath.conf, and add a "getuid_callout" or "prio_callout" line that
needs a callout program which was not previously needed by any configuration in
either /etc/multipath.conf or the default configs, you will need to restart
multipathd for the binary to be loaded into multipathd's cache. Simply running

# multipath -k"reconfigure"

Will not work.

In the very rare case where a customer needs a prio_callout not supplied by the
device-mapper-multipath package, or a specialized getuid_callout, and they have
already started up multipathd before they editted the /etc/multipath.conf to
include this information, the customer simply needs to run

# service multipathd restart

after editting /etc/multipath.conf, and everything should be fine.

Comment 37 Don Domingo 2007-12-19 01:05:43 UTC

added to RHEL5.2 release notes under "Resolved Issues":

<quote>
Root devices on multipathed hosts no longer freeze during fibre-channel path
faults. multipathd now creates a ramfs cache for all getuid and prio callouts
used. multipathd uses these cached callouts, making them persistent accross
possible path faults.

Note that these callouts are only cached during startup, and not during
reconfiguration. If you add a callout to /etc/multipath.conf after startup, this
callout will not be cached even if you run multipath -k"reconfigure".

To ensure that callout additions to /etc/multipath.conf are cached, restart
multipathd using service restart multipathd.
</quote>

please advise if any revisions are in order. thanks!

Comment 44 Andrius Benokraitis 2008-01-31 20:28:40 UTC

*** Bug 431119 has been marked as a duplicate of this bug. ***

Comment 45 Andrius Benokraitis 2008-01-31 20:31:15 UTC

Partner NetApp has tested the 5.1 erratum package and issue is still not
resolved. See bug 428338 Comment #7 and Comment #8.

https://bugzilla.redhat.com/show_bug.cgi?id=428338#c7

Comment 46 Ben Marzinski 2008-02-06 02:11:31 UTC

See if using the priority callout /sbin/mpath_prio_netapp.static fixes the
problem.  This is just a statically compiled version of the regular binary.

Comment 47 Martin George 2008-02-07 14:27:38 UTC

I did as you suggested and restarted the multipathd daemon. Then ran the test
script for simulating the faults (not the actual FC path faults yet). And the
results do look promising.

But with the new setting, the host does not boot after recreating the initrd
perhaps because the current RHEL 5.1 mkinitrd includes the statically linked
prio binaries and renames them without the .static extension.

Comment 48 Ben Marzinski 2008-02-07 16:59:48 UTC

I'm glad to have this finally nailed down.  For the actual fix, I was planning
on simply making the non-static callouts symbolic links to the static ones. This
seems like the the easiest way to solve the problem, without forcing people to
mess with their configuration.  Could you just make /sbin/mpath_prio_netapp a
symlink to /sbin/mpath_prio_netapp.static and verify that everything works for
you.  This fixes everything for me.

Comment 49 Martin George 2008-02-07 20:07:04 UTC

Making /sbin/mpath_prio_netapp a symlink to /sbin/mpath_prio_netapp.static
solves the boot issue.

Will update the results of the FC path faults later.

Comment 50 Ben Marzinski 2008-02-07 23:35:17 UTC

All the non-static callouts now are just symlinks to the static ones.

Comment 51 Andrius Benokraitis 2008-02-07 23:50:55 UTC

OK - it looks like the follow-up fix to this is now in bug 431947.

Comment 53 Ben Marzinski 2008-02-08 16:23:15 UTC

Actually, bz 431947 is now the stream version of this issue. This bug is being
used for the entire solution in both the errata and rpm spec file. Sorry for the
confusion.

Comment 54 Don Domingo 2008-02-20 22:52:55 UTC

added to RHEl5.2 release notes under "Resolved Issues":

<quote>
The priority callouts of dm-multipath are now statically compiled. This fixes a
problem that occurs when running dm-multipath on devices containing the root
filesystem, which caused such devices to freeze during fibre-channel path faults.
</quote>

please advise if any further revisions are required. thanks!

Comment 55 John Poelstra 2008-03-21 03:51:39 UTC

Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 56 Don Domingo 2008-04-02 02:10:15 UTC

Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 57 Ben Marzinski 2008-04-02 02:36:47 UTC

This fix is broken for because it causes a regression reported in bz 439030. 
For some reason that I don't quite understand.  If I clone the multipathd
process and it gets its own stack, selinux in either permissive or enforcing
modes, causes multipathed to crash when device-mapper tries to create the
multipath device nodes.

Comment 58 Ben Marzinski 2008-04-02 02:42:06 UTC

*** Bug 439030 has been marked as a duplicate of this bug. ***

Comment 61 Ben Marzinski 2008-04-02 03:13:34 UTC

Fixed. I use fork() and unshare(CLONE_NEWNS) to get my own namespace, which
avoids the clone() call and means I don't have to create my own stack space,
which was causing the problem with selinux.

Comment 62 Andrius Benokraitis 2008-04-02 03:32:35 UTC

Note to NetApp: this regression was originally reported in bug 438150, so
assuming we'll be using this bug (bug 355961) in RHEL 5.2, then bug 438150 may
be used for 5.1.z.

Comment 64 Bryn M. Reeves 2008-04-02 09:43:36 UTC

Ben, I can test on IA64

Comment 66 Kiyoshi Ueda 2008-04-02 16:02:30 UTC

Confirmed the following 2 problems in device-mapper-multipath-0.4.7-16.el5
are fixed in device-mapper-multipath-0.4.7-17.el5.
    o multipathd stalls on IA64 + SELinux (Reported in bug 439030)
    o multipathd gets segfault on x86_64 + SELinux (Reported in bug 438150)

Comment 67 John Poelstra 2008-04-02 21:34:18 UTC

Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 68 Don Domingo 2008-04-03 01:54:33 UTC

hi guys,
does the release note for this bug (quoted in Comment# 54) still stand? please
advise before April 15 if any revisions are required. thanks!

Comment 70 Andrius Benokraitis 2008-04-03 03:24:04 UTC

Don - probably not. Another fix went in as of Comment #61, so I'd assume this
isn't up to date. The release notes would need be updated to include this - not
sure if you can do this or if Ben needs to supply it to you.

Comment 71 Andrius Benokraitis 2008-04-03 03:27:23 UTC

Setting to VERIFIED based on NEC's gracious testing, which seems to include the
most recent item. Correct me if I'm wrong here.

Comment 72 Don Domingo 2008-04-03 04:16:11 UTC

ach. in that case, can somebody post the necessary edits to the RHEl5.2 release
note for this bug? at present, it still appears as quoted in Comment#54. 

note that the deadline for the RHEl5.2 release notes is on April 15, at which
point no further revisions will be entertained.

Comment 73 Kiyoshi Ueda 2008-04-03 21:04:06 UTC

Ben's fix in Comment#61 doesn't affect the release note,
so no change is required.
But I think we could make it better like:

<quote>
The priority callouts of dm-multipath are now statically compiled and
copied onto the memory of the monitoring daemon, multipathd.
So multipathd doesn't require access the root filesystem to execute
the priority callouts.
This fixes a multipathd stall problem that occurs when running dm-multipath
on devices containing the root filesystem and all paths of the devices fail,
which caused such devices to keep unavailable even if the failed paths are
restored.
</quote>

It's just my comment.  If it's not comfortable for Red Hat, ignoring
the comment is no problem for me.

Comment 74 Don Domingo 2008-04-04 03:06:12 UTC

thanks Kiyoshi. revising as follows:

<quote>
The priority callouts of dm-multipath are now statically compiled and copied
onto the memory of multipathd. As such, multipathd no longer requires access to
the root filesystem in order to execute priority callouts.

This fixes a problem that occurs when running dm-multipath on devices containing
the root file system, which caused such devices to freeze during fibre-channel
path faults.
</quote>

please advise before April 15 if any further revisions are required. thanks!

Comment 76 errata-xmlrpc 2008-05-21 15:35:38 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0337.html

Note You need to log in before you can comment on or make changes to this bug.

agk
andriusb
bdevouge
bmarzins
christophe.varoqui
coughlan
ddomingo
drussell
dwysocha
egoggin
i-kitayama
james.smart
jamie.wellnitz
junichi.nomura
kueda
kueda
laurie.barry
lmb
lsmid
mbroz
mukesh.kumar
poelstra
prockai
rsarraf
tellis
tranlan
xdl-redhat-bugzilla