683447 – multipathd occassionally doesn't stop queuing after no_path_retry times out.

Bug 683447 - multipathd occassionally doesn't stop queuing after no_path_retry times out.

Summary: multipathd occassionally doesn't stop queuing after no_path_retry times out.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	5.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Gris Ge
Docs Contact:
URL:
Whiteboard:
Depends On:	677821
Blocks:
TreeView+	depends on / blocked

Reported:	2011-03-09 12:32 UTC by RHEL Program Management
Modified:	2011-03-29 03:00 UTC (History)
CC List:	17 users (show)
Fixed In Version:	device-mapper-multipath-0.4.7-42.el5_6.2
Doc Type:	Bug Fix
Doc Text:	If a device's last path was deleted while the multipathd daemon was trying to reload the device map, or if a ghost path failed, multipathd did not always switch into the recovery mode. As a result, multipath devices could not recover I/O operations in setups that were supposed to temporarily queue I/O if all paths were down. This update resolves both of these issues; multipath now correctly recovers I/O operations as configured.
Clone Of:
Environment:
Last Closed:	2011-03-24 07:08:52 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0379	0	normal	SHIPPED_LIVE	device-mapper-multipath bug fix update	2011-03-24 07:08:47 UTC

Description RHEL Program Management 2011-03-09 12:32:27 UTC

This bug has been copied from bug #677821 and has been proposed
to be backported to 5.6 z-stream (EUS).

Comment 4 Ben Marzinski 2011-03-14 22:56:02 UTC

Fix ported.

Comment 6 Martin Prpič 2011-03-22 11:21:31 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
If a device's last path was deleted while the multipathd daemon was trying to reload the device map, or if a ghost path failed, multipathd did not always switch into the recovery mode. As a result, multipath devices could not recover I/O operations in setups that were supposed to temporarily queue I/O if all paths were down. This update resolves both of these issues; multipath now correctly recovers I/O operations as configured.

Comment 7 Gris Ge 2011-03-23 06:41:03 UTC

Reproduced the two issue. Problem have been fixed in 
device-mapper-multipath-0.4.7-42.el5_6.2

Enviroment:
EMC CX RAID 5 LUN Active/Passive setting.
Using this command to remove disk:
echo 1 > /sys/block/sdj/device/delete

multipath.conf:
================================
defaults {
    user_friendly_names     yes
    log_checker_err once
   verbosity 3
}
blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
        wwid 360a98000486e584c5334615938372d51
        wwid 200173800233600d0
        wwid 36001438005deb4710000500001700000
        wwid 360a98000486e584c53346159384a4963
}
devices {
    device {
        vendor  "DGC"
        product "RAID 5"
        no_path_retry 1
    }
}

For the first issue,
Before fix, we will got this when all path have been removed.
Mar 23 01:00:06 storageqe-08 multipathd: mpath1: Entering recovery mode: max_retries=10
After fix,
Mar 23 02:35:25 storageqe-08 multipathd: mpath1: Entering recovery mode: max_retries=1 


For second issue:
When all path failed (remove ghost path first), the queue has never been failed. dd command keep in S mode.
After pathed, I have tried 10 times, issue  gone.

Comment 8 errata-xmlrpc 2011-03-24 07:08:52 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0379.html

Note You need to log in before you can comment on or make changes to this bug.