Bug 500580 - [NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host
Summary: [NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.8
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 4.9
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 626414
TreeView+ depends on / blocked
 
Reported: 2009-05-13 11:21 UTC by Naveen Reddy
Modified: 2011-02-16 14:24 UTC (History)
17 users (show)

Fixed In Version: device-mapper-multipath-0.4.5-40.el4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 14:24:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Attaching the /etc/multipath.conf and /var/log/messages (1.09 MB, application/x-tar)
2009-05-13 11:28 UTC, Naveen Reddy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0243 0 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2011-02-15 16:34:54 UTC

Description Naveen Reddy 2009-05-13 11:21:14 UTC
Description of Problem:
On a RHEL4.8 host, Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host

Version-Release number of selected component (if applicable):
OS : RHEL4.8
device-mapper-multipath-0.4.5-33.el4
device-mapper-1.02.28-2.el4

Steps to Reproduce:
1. Map few LUNs to RHEL4.8 host ,configure the LUNs and start I/O.
2. Do a target controller fault (say "takeover" and "giveback") on the netapp storage controller.
3. All paths are reinstated within 2 minutes.
4. Wait for around 30 to 35 minutes.
5. Few paths to the LUNs are getting dropped and reinstated back.

Actual Results:
Paths are dropping after 30 to 35 minutes after the target controller faults.

Expected Results:
Paths should not drop.

Additional Info:
This issue is not seen when "path_checker" is set to "readsector0" in /etc/multipath.conf.
Attaching the /etc/multipath.conf and /var/log/messages.

Comment 1 Naveen Reddy 2009-05-13 11:28:50 UTC
Created attachment 343746 [details]
Attaching the /etc/multipath.conf and /var/log/messages

Comment 2 Andrius Benokraitis 2009-05-13 12:40:56 UTC
Setting target for RHEL 4.9, since 4.8 is pretty well baked already.

Comment 4 Ben Marzinski 2010-05-04 21:34:29 UTC
Is this not a problem with directio in RHEL5?  Also, would it be possible to get
the results of starting multipathd with

# multipathd -v3

and reproducing the issue, instead of using the initscripts?

Comment 7 Andrius Benokraitis 2010-10-25 13:24:24 UTC
NetApp: Red Hat is hoping to have this completed this week.

Comment 8 Ben Marzinski 2010-10-28 04:01:16 UTC
This issue appears to be due to the fact that multipathd doesn't asychronously check the path state with directio in RHEL4.  That means that it can't wait very long for the path to respond, because it is stalling multipathd while it does.  I backported the code to make multipathd wait asynchronously for the IO to complete with the directio checker, this allows it to wait for 30 seconds for either a success or failure response.  This fixed a similiar problem on RHEL5.

Comment 10 errata-xmlrpc 2011-02-16 14:24:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0243.html


Note You need to log in before you can comment on or make changes to this bug.