Bug 463389

Summary:	[LTC 6.0 FEAT] 201313:Block layer I/O cancel (abort) capability
Product:	Red Hat Enterprise Linux 6	Reporter:	IBM Bug Proxy <bugproxy>
Component:	kernel	Assignee:	Ameet Paranjape <aparanja>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Martin Jenner <mjenner>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.0	CC:	cward, ejratl, kmonroe, mgahagan, notting, peterm, snagar
Target Milestone:	alpha	Keywords:	FutureFeature
Target Release:	6.0
Hardware:	ppc64
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-11-12 23:24:55 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	356741, 465489

Description IBM Bug Proxy 2008-09-23 04:01:02 UTC

=Comment: #0=================================================
Emily J. Ratliff <emilyr.com> - 2008-09-16 18:31 EDT
1. Feature Overview:
Feature Id:	[201313]
a. Name of Feature:	Block layer I/O cancel (abort) capability
b. Feature Description
A mechanism to request that a previously submitted block IO request be aborted or canceled has been
discussed in the IO community before. This capability would not only be useful for the xDR failover
capability, but has previous been requested for high availability device mapper RAID mirroring.

Additional Comments:	This feature will not make 2.6.27. Jens indicated that he would try to make
2.6.27, but then appears to have been out during the merge window. Ref
http://permalink.gmane.org/gmane.linux.scsi/42575

2. Feature Details:
Sponsor:	PPC
Architectures:
x86_64
ppc64
s390x

Arch Specificity: Purely Common Code
Affects Core Kernel: Yes
Affects Kernel Modules: Yes
Delivery Mechanism: Direct from community
Category:	Device Drivers and IO
Request Type:	Kernel - Enhancement from IBM
d. Upstream Acceptance:	In Progress
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	IBM code
g. Component Version Target:	2.6.27

3. Business Case
Certain environments, like real-time applications or HA/DR solutions, have requirements for specific
deadlines in regard of I/O, after which they consider an /IO to be failed according to their
specific QoS. These deadlines may be contradictory to the timeout behavior of current device
drivers. Since this behavior (timeout) can usually not be changed ad-hoc, depending on the current
situation, a mechanism is requested to cancel outstanding I/O - in addition to the SCSI specific
Abort-mechanism.  There are currently two potential exploiters for such a function - the IBM xDR
disaster recovery solution, and the IBM device-mapper based implementation of RAID1+ real-time
enhancements. The md based RAID1 solution included in SLES could also be augmented by real-time
capabilities, if such an infrastructure would be in place.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie Glass, sglass.com, 512-838-9284

Technical contact(s):
Mike Anderson, mike.anderson.com
Michael Anderson

IBM Manager:
Wendel Voigt, wvoigt.com

Comment 1 Bill Nottingham 2008-10-02 17:40:29 UTC

Putting in NEEDINFO pending upstream code acceptance.

Comment 2 IBM Bug Proxy 2009-03-04 07:11:01 UTC

Code is upstream since v2.6.28

Commit 242f9dcb8ba6f68fcd217a119a7648a4f69290e9
There was commit fixes post the initial commit.

Comment 3 Bill Nottingham 2009-03-04 15:51:26 UTC

OK, marking as MODIFIED.

The feature requested has already been accepted into the upstream code base
planned for the next major release of Red Hat Enterprise Linux.

When the next milestone release of Red Hat Enterprise Linux 6 is available,
please verify that the feature requested is present and functioning as
desired.

Comment 4 Kevin W Monroe 2009-11-12 23:24:55 UTC

Closing as support is available in current release.

Comment 5 IBM Bug Proxy 2010-05-13 19:30:38 UTC

------- Comment From andmike.ibm.com 2010-05-13 15:27 EDT-------
1.)  During testing on a power system a race between blk_abort_queue and the scsi_request_fn was discovered. Using a debug patch which injects blk_abort_queue calls every 5 seconds I am able to see a race on my lab system. I am working on a patch and I will post to linux-scsi shortly
Here is the post on the issue.
http://article.gmane.org/gmane.linux.scsi/58865

I am currently working this issue on linux-scsi.

2.) I also posted a bug fix to address a scsi command leak in the patch series listed below. I will try backport just this change and attach to this bug.

http://thread.gmane.org/gmane.linux.scsi/58669

Comment 6 IBM Bug Proxy 2010-06-06 22:01:07 UTC

------- Comment From malahal.com 2010-06-06 17:52 EDT-------
tested with 'dmsetup message' and 'multipathd -k, followed by "fail path <path>"' commands to simulate failures. I see the SCSI layer immediately aborting commands from the messages in syslog.