Bug 677821
Summary: | multipathd occassionally doesn't stop queuing after no_path_retry times out. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ben Marzinski <bmarzins> |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
Status: | CLOSED ERRATA | QA Contact: | Gris Ge <fge> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 5.6 | CC: | agk, bdonahue, bmarzins, bmr, dwa, dwysocha, fge, guy.legac, heinzm, mbroz, prajnoha, prockai, qcai, zkabelac |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
If a device's last path was deleted while the multipathd daemon was trying to reload the device map, or if a ghost path failed, multipathd did not always switch into the recovery mode. As a result, multipath devices could not recover I/O operations in setups that were supposed to temporarily queue I/O if all paths were down. Multipath now correctly recovers I/O operations as configured.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-07-21 08:23:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 683447 |
Description
Ben Marzinski
2011-02-15 23:27:03 UTC
Test packages are available here: http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/i386/ and http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/x86_64/ Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: If a device's last path was deleted while the multipathd daemon was trying to reload the device map, or if a ghost path failed, multipathd did not always switch into the recovery mode. As a result, multipath devices could not recover I/O operations in setups that were supposed to temporarily queue I/O if all paths were down. Multipath now correctly recovers I/O operations as configured. This is a rhel5 bug, and this fix is built into the rhel5 tarball. However, now that I look at RHEL6, there's a part of this fix that is missing. The rest of it is also already in the rhel6 tarball. I'm going to open a bug for the missing piece. for queue_if_no_path issue, it has been fixed by device-mapper-multipath-0.4.7-46.el5. Tested with emc_clariion_checker and set up no_path_retry as 5 in devices setcion of multipath.conf Using dd for generating I/O. And these command for bring disk offline: === for X in `echo "sdg sdi sdw sdy"`;do echo offline > "/sys/block/$X/device/state" done === Before patch, we will got incorrect no_path_retry_nr: === mpath5: Entering recovery mode: max_retries=60 === After: mpath5: Entering recovery mode: max_retries=5 For PATH_GHOST issue, not able to reproduce it as lacking of specified path_checker: "rdac, tur, and hp_sw path checkers" Also tried to change path_checker of DGC, but still not able to reproduce it. Can we get any partner test to PATH_GHOST issue? I opened a ticket Case 00484908 about a similar issue detected on RHEL 5.6 X86_64 platform. please See Qlogic forum : http://solutions.qlogic.com/KanisaSupportSite/browse.do?BROWSE_forum.NodeType=leaf&WidgetName=BROWSE_forum&BROWSE_forum.thisPageUrl=%2Fforum%2Fforumshome.do&NodeType=leaf&NodeName=Fibre+Channel+Linux&TaxoName=FB_ForumBrowse&BROWSE_forum.NodeId=FB_HBA_LINUX_1_2&BROWSE_forum.IsForum=true&NodeId=FB_HBA_LINUX_1_2&id=m3&BROWSE_forum.TaxoName=FB_ForumBrowse&AppContext=AC_ForumCategoryPage (In reply to comment #17) > I opened a ticket Case 00484908 about a similar issue detected on RHEL 5.6 > X86_64 platform. > > please See Qlogic forum : > http://solutions.qlogic.com/KanisaSupportSite/browse.do?BROWSE_forum.NodeType=leaf&WidgetName=BROWSE_forum&BROWSE_forum.thisPageUrl=%2Fforum%2Fforumshome.do&NodeType=leaf&NodeName=Fibre+Channel+Linux&TaxoName=FB_ForumBrowse&BROWSE_forum.NodeId=FB_HBA_LINUX_1_2&BROWSE_forum.IsForum=true&NodeId=FB_HBA_LINUX_1_2&id=m3&BROWSE_forum.TaxoName=FB_ForumBrowse&AppContext=AC_ForumCategoryPage Does the rpm in comment #1 solve your problem? Tried 100 times on EMC CX (active/passive setup) using tur path_checker. Cannot hit active paths: <negative_number> issue. Basic function test will be noted in errata. Sanity Only. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1032.html |