Bug 215975 - kernel dm: application visible I/O errors with dm-multipath and queue_if_no_path when adding new path
Summary: kernel dm: application visible I/O errors with dm-multipath and queue_if_no_p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ben Marzinski
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 184189
TreeView+ depends on / blocked
 
Reported: 2006-11-16 17:26 UTC by Kiyoshi Ueda
Modified: 2010-01-12 02:36 UTC (History)
21 users (show)

Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-03 15:01:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kiyoshi Ueda 2006-11-16 17:26:16 UTC
noflush feature has been included in kernel and device-mapper,
but it has not been included in device-mapper-multipath.
So I cloned the Bugzilla for device-mapper-multipath.


+++ This bug was initially created as a clone of Bug #169302 +++

Description of problem:
If you lose all paths to a device, queue_if_no_path is set, and then you try to
add a new path to the same device via running multipath tool again, you'll get
I/O errors visible to the application.  This has been discussed before, and
was discussed in the 9/22/2005 multipath concall, with Alasdair confirming that
this was the expected behavior as of today, and that there were a few possible
approaches to solving this problem.

Version-Release number of selected component (if applicable):
RHEL4 U2
- kernel: 2.6.9-22.ELsmp
- device-mapper-1.01.04-1.0.RHEL4
- device-mapper-multipath-0.4.5-5.2.RHEL4
- udev-039-10.11.EL4
- hotplug-2004_04_01-7.6
- lvm2-2.01.14-1.0.RHEL4


How reproducible:
Every time

Steps to Reproduce:
1. Map one LUN, with 2 paths to the LUN, and queue_if_no_path set, and start I/O
running over the LUN
2. Pull cables (or make other disruption) so that both paths go down and note
that I/O gets queued
3. Add a 3rd path to the same LUN, and run multipath again to pick up the new path
4. Observe that application running I/O over the LUN gets I/O errors  

Actual results:
Application in step #4 receives I/O errors

Expected results:
Application in step #4 _not_ to receive I/O errors


Additional info:
Note that if you don't lose all paths, you can add a new path in via the
multipath tool and not get application visible I/O errors.

-- Additional comment from agk on 2005-10-03 18:14 EST --
Either:
  It needs to be possible for a dm table to pass context information to its
successor;

Or:
  The queueing needs to happen at the dm level rather than with the multipath
target.

-- Additional comment from andriusb on 2006-01-09 10:26 EST --
This has been moved to being proposed for RHEL4 U4.

-- Additional comment from agk on 2006-03-03 13:05 EST --
*** Bug 180437 has been marked as a duplicate of this bug. ***

-- Additional comment from agk on 2006-03-15 12:09 EST --
Will be tricky to do this without affecting kABI - may have to wait until RHEL5.

-- Additional comment from j-nomura.nec.com on 2006-03-16 10:36 EST --
While there could be different approaches,
NEC have proposed a solution by pushing back target bios to dm core.
https://www.redhat.com/archives/dm-devel/2006-March/msg00053.html

We would appreciate feedbacks in dm-devel and are happy to discuss
other possible approaches.


Our proposal doesn't change exported structure and exported symbols.
Only change it affects kABI (API, actually) is that the extention
of return values of target's map() and endio().

(Please read the followings like diff output)

map() returns:
  * < 0: error
  * = 0: The target will handle the io by resubmitting it later
- * > 0: simple remap complete
+ * = 1: simple remap complete
+ * = 2: The target wants to push back the io

endio() returns:
  * 0   : ended successfully
  * 1   : for some reason the io has still not completed (eg,
  *       multipath target might want to requeue a failed io).
+ * 2   : The target wants to push back the io

No in-box dm target drivers are affected by them.
But we cannot deny possible existence of out-box target driver
which use positive values for other purpose.

Disadvantage of this approach may be the necessity of modifying
target drivers to activate this feature.


-- Additional comment from dkl on 2006-04-18 16:37 EST --
NEEDINFO_ENG has been deprecated in favor of NEEDINFO or ASSIGNED. Changing
status to ASSIGNED for ENG review.

-- Additional comment from andriusb on 2006-06-26 12:02 EST --
Moving this bug to be proposed for RHEL5 GA based on comments from Alasdar, and
then look at 4.5 afterwards. How big a deal is this for 4.5 for NetApp? 

-- Additional comment from dzickus on 2006-10-10 21:50 EST --
in kernel-2.6.18-1.2725.el5

-- Additional comment from jturner on 2006-10-30 10:57 EST --
Did the device-mapper and device-mapper-multipath changes get in as well. 
There's nothing really obvious in the changelogs to indicate.

Comment 2 Ben Marzinski 2006-12-05 17:09:26 UTC
Oops. I missed this one.  This was fix in the
device-mapper-multipath-0.4.7-6.el5 package.

Comment 3 Larry Troan 2006-12-22 03:10:43 UTC
Looks like this is already in RHEL5 beta per comment #2 above. 

/mnt/curly/nightly/RHEL5-Server-20061220.nightly/tree-i386/Server has
device-mapper-multipath-0.4.7-7.el5.i386.rpm

Escalating as an exception for management concurrence since it doesn't have any
ACKs.

Comment 4 RHEL Program Management 2006-12-22 03:20:58 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 Kiersten (Kerri) Anderson 2006-12-22 16:43:44 UTC
Devel ACK - Already in the build

Comment 6 Jay Turner 2007-01-03 15:01:45 UTC
device-mapper-multipath-0.4.7-6.el5 included in the 20061218.1 trees.


Note You need to log in before you can comment on or make changes to this bug.