500580 – [NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host

Bug 500580 - [NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host

Summary: [NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	4.9
Assignee:	Ben Marzinski
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	626414
TreeView+	depends on / blocked

Reported:	2009-05-13 11:21 UTC by Naveen Reddy
Modified:	2011-02-16 14:24 UTC (History)
CC List:	17 users (show)
Fixed In Version:	device-mapper-multipath-0.4.5-40.el4
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-02-16 14:24:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Attaching the /etc/multipath.conf and /var/log/messages (1.09 MB, application/x-tar) 2009-05-13 11:28 UTC, Naveen Reddy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0243	0	normal	SHIPPED_LIVE	device-mapper-multipath bug fix and enhancement update	2011-02-15 16:34:54 UTC

Description Naveen Reddy 2009-05-13 11:21:14 UTC

Description of Problem:
On a RHEL4.8 host, Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host

Version-Release number of selected component (if applicable):
OS : RHEL4.8
device-mapper-multipath-0.4.5-33.el4
device-mapper-1.02.28-2.el4

Steps to Reproduce:
1. Map few LUNs to RHEL4.8 host ,configure the LUNs and start I/O.
2. Do a target controller fault (say "takeover" and "giveback") on the netapp storage controller.
3. All paths are reinstated within 2 minutes.
4. Wait for around 30 to 35 minutes.
5. Few paths to the LUNs are getting dropped and reinstated back.

Actual Results:
Paths are dropping after 30 to 35 minutes after the target controller faults.

Expected Results:
Paths should not drop.

Additional Info:
This issue is not seen when "path_checker" is set to "readsector0" in /etc/multipath.conf.
Attaching the /etc/multipath.conf and /var/log/messages.

Comment 1 Naveen Reddy 2009-05-13 11:28:50 UTC

Created attachment 343746 [details]
Attaching the /etc/multipath.conf and /var/log/messages

Comment 2 Andrius Benokraitis 2009-05-13 12:40:56 UTC

Setting target for RHEL 4.9, since 4.8 is pretty well baked already.

Comment 4 Ben Marzinski 2010-05-04 21:34:29 UTC

Is this not a problem with directio in RHEL5?  Also, would it be possible to get
the results of starting multipathd with

# multipathd -v3

and reproducing the issue, instead of using the initscripts?

Comment 7 Andrius Benokraitis 2010-10-25 13:24:24 UTC

NetApp: Red Hat is hoping to have this completed this week.

Comment 8 Ben Marzinski 2010-10-28 04:01:16 UTC

This issue appears to be due to the fact that multipathd doesn't asychronously check the path state with directio in RHEL4.  That means that it can't wait very long for the path to respond, because it is stalling multipathd while it does.  I backported the code to make multipathd wait asynchronously for the IO to complete with the directio checker, this allows it to wait for 30 seconds for either a success or failure response.  This fixed a similiar problem on RHEL5.

Comment 10 errata-xmlrpc 2011-02-16 14:24:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0243.html

Note You need to log in before you can comment on or make changes to this bug.