Red Hat Bugzilla – Bug 500580
[NetApp 4.9 bug]Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host
Last modified: 2011-02-16 09:24:20 EST
Description of Problem:
On a RHEL4.8 host, Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host
Version-Release number of selected component (if applicable):
OS : RHEL4.8
Steps to Reproduce:
1. Map few LUNs to RHEL4.8 host ,configure the LUNs and start I/O.
2. Do a target controller fault (say "takeover" and "giveback") on the netapp storage controller.
3. All paths are reinstated within 2 minutes.
4. Wait for around 30 to 35 minutes.
5. Few paths to the LUNs are getting dropped and reinstated back.
Paths are dropping after 30 to 35 minutes after the target controller faults.
Paths should not drop.
This issue is not seen when "path_checker" is set to "readsector0" in /etc/multipath.conf.
Attaching the /etc/multipath.conf and /var/log/messages.
Created attachment 343746 [details]
Attaching the /etc/multipath.conf and /var/log/messages
Setting target for RHEL 4.9, since 4.8 is pretty well baked already.
Is this not a problem with directio in RHEL5? Also, would it be possible to get
the results of starting multipathd with
# multipathd -v3
and reproducing the issue, instead of using the initscripts?
NetApp: Red Hat is hoping to have this completed this week.
This issue appears to be due to the fact that multipathd doesn't asychronously check the path state with directio in RHEL4. That means that it can't wait very long for the path to respond, because it is stalling multipathd while it does. I backported the code to make multipathd wait asynchronously for the IO to complete with the directio checker, this allows it to wait for 30 seconds for either a success or failure response. This fixed a similiar problem on RHEL5.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.