516170 – kernel multipath driver behaves badly on medium errors

Bug 516170 - kernel multipath driver behaves badly on medium errors

Summary: kernel multipath driver behaves badly on medium errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Mike Snitzer
QA Contact:	Gris Ge
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	668957 719046 726799
TreeView+	depends on / blocked

Reported:	2009-08-07 06:33 UTC by David Monro
Modified:	2018-11-27 20:00 UTC (History)
CC List:	12 users (show)
Fixed In Version:	kernel-2.6.18-290.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-02-21 03:26:15 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
use scsi_debug driver to emulate multipath (418 bytes, patch) 2009-09-18 18:05 UTC, David Jeffery	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2012:0150	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 5.8 kernel update	2012-02-21 07:35:24 UTC

Description David Monro 2009-08-07 06:33:47 UTC

Description of problem:
When an attempt is made to read a bad sector via a multipath device, the multipath driver fails the path it hit the medium error through, and tries again through the next path. Meanwhile the multipathd daemon re-enables the path. As a result the I/O hangs forever (even if you unset the queue_if_no_path feature and set no_path_retry to something sensible, because the paths get re-enabled we never get to the point where there are no live paths).

Scenario: we are using an EMC Clariion CX4-240 effectively as JBOD for this host; individual backend disks are in raid groups of type "Disk", with a single lun on each of these raid groups presented to the host. We have configured the RedHat multipath driver to disable the queue_if_no_path feature, and set the no_path_retry option to 60. The idea is that in the event of failure of a disk, the IO will eventually fail at the application level so the application can know the disk has failed and take appropriate action (the application in this instance being Oracle ASM).

Version-Release number of selected component (if applicable):
Tried with both 2.6.18-128.1.6.el5 x86_64 and 2.6.18-157.el5 x86_64.

How reproducible:
Every time (if you have access to a disk with a medium error).

Steps to Reproduce:
1. Configure an EMC Clariion CX4-240 with a disk with a known medium error into a raid group of type 'Disk', bind a lun to the resulting raid group and present to a host (this means you have a one-to-one mapping between the lun presented by the array and the backend disk, with no redundancy - effectively using the array as JBOD).
2. Attempt to read from the portion of the disk containing the medium error, via the multipath device.
  
Actual results:
The IO will hang forever, as the multipath driver retries the request down different paths and the multipathd keeps re-enabling the paths behind it.

Expected results:
The IO will terminate with an I/O error (just as it does if you attempt a read from one of the /dev/sdX devices making up the multipath device). Reading the same sector via a different path isn't going to make it magically get better.

Additional info:

You will see messages like this in /var/log/messages:

Aug  7 15:11:41 omnitrix kernel: sd 0:0:0:0: SCSI error: return code = 0x08070002
Aug  7 15:11:41 omnitrix kernel: sda: Current: sense key: Medium Error
Aug  7 15:11:41 omnitrix kernel:     Add. Sense: Unrecovered read error
Aug  7 15:11:41 omnitrix kernel: 
Aug  7 15:11:41 omnitrix kernel: end_request: I/O error, dev sda, sector 23904512
Aug  7 15:11:41 omnitrix kernel: device-mapper: multipath: Failing path 8:0.
Aug  7 15:11:41 omnitrix multipathd: 8:0: mark as failed 
Aug  7 15:11:41 omnitrix multipathd: plaza_0_2_6_mediumerr_asm: remaining active paths: 3 
Aug  7 15:11:41 omnitrix multipathd: dm-6: add map (uevent) 
Aug  7 15:11:41 omnitrix multipathd: dm-6: devmap already registered 
Aug  7 15:11:46 omnitrix multipathd: sda: emc_clariion_checker: Path healthy 
Aug  7 15:11:46 omnitrix multipathd: 8:0: reinstated 
Aug  7 15:11:46 omnitrix multipathd: plaza_0_2_6_mediumerr_asm: remaining active paths: 4 
Aug  7 15:11:46 omnitrix multipathd: dm-6: add map (uevent) 
Aug  7 15:11:46 omnitrix multipathd: dm-6: devmap already registered 
Aug  7 15:13:41 omnitrix kernel: sd 1:0:0:0: SCSI error: return code = 0x08070002
Aug  7 15:13:41 omnitrix kernel: sdg: Current: sense key: Medium Error
Aug  7 15:13:41 omnitrix kernel:     Add. Sense: Unrecovered read error
Aug  7 15:13:41 omnitrix kernel: 
Aug  7 15:13:41 omnitrix kernel: end_request: I/O error, dev sdg, sector 23904512
Aug  7 15:13:41 omnitrix kernel: device-mapper: multipath: Failing path 8:96.
Aug  7 15:13:41 omnitrix multipathd: 8:96: mark as failed 
Aug  7 15:13:41 omnitrix multipathd: plaza_0_2_6_mediumerr_asm: remaining active paths: 3 
Aug  7 15:13:41 omnitrix multipathd: dm-6: add map (uevent) 
Aug  7 15:13:41 omnitrix multipathd: dm-6: devmap already registered 
Aug  7 15:13:46 omnitrix multipathd: sdg: emc_clariion_checker: Path healthy 
Aug  7 15:13:46 omnitrix multipathd: 8:96: reinstated 
Aug  7 15:13:46 omnitrix multipathd: plaza_0_2_6_mediumerr_asm: remaining active paths: 4 
Aug  7 15:13:46 omnitrix multipathd: plaza_0_2_6_mediumerr_asm: remaining active paths: 4 
Aug  7 15:13:46 omnitrix multipathd: dm-6: add map (uevent) 
Aug  7 15:13:46 omnitrix multipathd: dm-6: devmap already registered 
Aug  7 15:14:35 omnitrix ntpd[3722]: synchronized to 10.40.2.1, stratum 3
Aug  7 15:14:43 omnitrix kernel: sd 0:0:0:0: SCSI error: return code = 0x08070002
Aug  7 15:14:43 omnitrix kernel: sda: Current: sense key: Medium Error
Aug  7 15:14:43 omnitrix kernel:     Add. Sense: Unrecovered read error
Aug  7 15:14:43 omnitrix kernel: 
Aug  7 15:14:43 omnitrix kernel: end_request: I/O error, dev sda, sector 23904512
Aug  7 15:14:43 omnitrix kernel: device-mapper: multipath: Failing path 8:0.
Aug  7 15:14:43 omnitrix multipathd: 8:0: mark as failed 

so you can see it is cycling back and forth between devices 0:0:0:0 (sda) and 1:0:0:0 (sdg) which are the 2 active paths to the device:

#/sbin/multipath -lll
plaza_0_2_6_mediumerr_asm (36006016061b0220089ab8d27bd33de11) dm-6 DGC,DISK
[size=134G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 0:0:0:0 sda 8:0   [active][ready]
 \_ 1:0:0:0 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:2:0 sdd 8:48  [active][ready]
 \_ 1:0:2:0 sdj 8:144 [active][ready]


Each time it fails a path, multipathd correctly re-enables it (since of course you can do IO to the rest of the disk happily). In my view the multipath driver shouldn't retry the IO, just return the IO error up the stack to the application.

Comment 1 David Jeffery 2009-09-18 18:05:10 UTC

Created attachment 361701 [details]
use scsi_debug driver to emulate multipath

The problem is confirmed and reproducible with a small code change to the scsi_debug driver.  The patch makes multipath think the debug disks on different debug hosts are the same disk.

To reproduce, load the modified scsi_debug driver with parameters add_hosts=2 opts=2 to have 2 debug disks and enable the medium error option.  Configure the devices in multipath.conf with no_path_retry (I used a value of 15) and start multipath.  Then, read sector 0x1234 on the multipath device.  scsi_debug will return a medium error for this sector.  The read request will repeatedly bounce between the two scsi_debug devices and never complete.

Comment 2 Tom Lanyon 2010-04-12 02:44:07 UTC

We have witnessed this also on HP EVA storage arrays with qla2xxx FC HBAs under RHEL 5.3 x86_64.

We have implemented the same workaround as the original reporter (disabling queue_if_no_path) for that map.

Comment 4 RHEL Program Management 2010-12-07 10:03:23 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 7 Mike Snitzer 2011-08-18 22:00:23 UTC

(In reply to comment #0)

> Each time it fails a path, multipathd correctly re-enables it (since of course
> you can do IO to the rest of the disk happily). In my view the multipath driver
> shouldn't retry the IO, just return the IO error up the stack to the
> application.

Right, and this is how RHEL6.2 should behave because it includes these upstream commits (all are in Linux 2.6.39) to immediately propagate the error:

http://git.kernel.org/linus/ad63082
http://git.kernel.org/linus/63583cc
http://git.kernel.org/linus/751b2a7
http://git.kernel.org/linus/7977556

I'll need to scope how difficult it would be to backport those changes to 5.8.

Could be pulling in the SCSI patches (listed above) depends on other SCSI patches that RHEL5 doesn't have.. making for a backport that snowballs and ultimately isn't doable (due to kABI or some such).

Setting Conditional NAK: Design

Comment 15 Jarod Wilson 2011-10-18 13:51:41 UTC

Patch(es) available in kernel-2.6.18-290.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 17 Gris Ge 2011-12-08 08:32:08 UTC

Reproduced on RHEL 5.7 GA.

Multipath will try all paths before return I/O error to application.

on kernel -300, multipath return I/O error to application right after first path got "Medium Error"

Commands to setup multipath:
====
modprobe scsi_debug dev_size_mb=100 num_tgts=1 \
    vpd_use_hostno=0 add_host=4 delay=20 max_luns=2 no_lun_0=1 opts=2
====

Command to hit the medire error sector:
====
dd if=/dev/mapper/mpath0 of=/dev/null bs=512 skip=4659 count=1
====

dmesg will show whether multipath check all path or only 1 path.


VERIFY. For other storage regression test on this big change, they will be reported in errata. (so far, no issue found on kernel 300).

Comment 18 errata-xmlrpc 2012-02-21 03:26:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html

Note You need to log in before you can comment on or make changes to this bug.