Description of Problem: On a RHEL4.8 host, Target controller faults are resulting in underlying FC paths getting offlined/onlined on the host Version-Release number of selected component (if applicable): OS : RHEL4.8 device-mapper-multipath-0.4.5-33.el4 device-mapper-1.02.28-2.el4 Steps to Reproduce: 1. Map few LUNs to RHEL4.8 host ,configure the LUNs and start I/O. 2. Do a target controller fault (say "takeover" and "giveback") on the netapp storage controller. 3. All paths are reinstated within 2 minutes. 4. Wait for around 30 to 35 minutes. 5. Few paths to the LUNs are getting dropped and reinstated back. Actual Results: Paths are dropping after 30 to 35 minutes after the target controller faults. Expected Results: Paths should not drop. Additional Info: This issue is not seen when "path_checker" is set to "readsector0" in /etc/multipath.conf. Attaching the /etc/multipath.conf and /var/log/messages.
Created attachment 343746 [details] Attaching the /etc/multipath.conf and /var/log/messages
Setting target for RHEL 4.9, since 4.8 is pretty well baked already.
Is this not a problem with directio in RHEL5? Also, would it be possible to get the results of starting multipathd with # multipathd -v3 and reproducing the issue, instead of using the initscripts?
NetApp: Red Hat is hoping to have this completed this week.
This issue appears to be due to the fact that multipathd doesn't asychronously check the path state with directio in RHEL4. That means that it can't wait very long for the path to respond, because it is stalling multipathd while it does. I backported the code to make multipathd wait asynchronously for the IO to complete with the directio checker, this allows it to wait for 30 seconds for either a success or failure response. This fixed a similiar problem on RHEL5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0243.html