Bug 794803

Summary:	Confirm xen-blkfront deferred close works with RHEL5 xen tools
Product:	Red Hat Enterprise Linux 7	Reporter:	Andrew Jones <drjones>
Component:	kernel	Assignee:	Radim Krčmář <rkrcmar>
Status:	CLOSED CANTFIX	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	bfan, jingli, leiwang, mrezanin, qguan, qwan, vkuznets, wshi
Target Milestone:	rc	Keywords:	EC2, Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	xen
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	794806 (view as bug list)		Environment:
Last Closed:	2013-10-09 15:30:45 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	794806
Bug Blocks:	741684

Description Andrew Jones 2012-02-17 17:20:58 UTC

If a host attempts to detach a block device from a guest then xen-blkfront, will say "no, in use", and then do some setup for a deferred detach. On unmount that detach will occur. With RHEL5 tools if this is done, then first the tools see the "no, in use" and give up. After the unmount the block device goes to Closed (state==6), viewable with 'xm block-list'. Now that the device is in this state it gets stuck. The host no longer can completely detach it, or reattach it. We may want to fix this in the xen tools (I'm cloning this bug to xen as well), but even if we do, then we need to make sure that when rhel7 guests are run on older hosts without the tools fix, that we don't cause problems, particularly problems that may lead to disk corruption. This bug will be TestOnly for starters.

Comment 1 Miroslav Rezanina 2012-07-03 13:52:12 UTC

With fix for 794806, device is properly removed from the guest without any freeze Unfortunately, it stays in xenstore (in state==6) so disk can't be re-attached. This require xend restart.

Comment 3 Andrew Jones 2013-04-19 11:42:51 UTC

Vitaly tested this over EC2 with the following steps

0) attach volume 1) mount volume 2) detach volume 3) see if volume still works 4) umount volume 5) see if you can mount it again - shouldn't be able to 6) attach volume 7) mount volume 8) confirm no data corruption

for a rhel6 guest he found that it worked, but a delay of several seconds was needed between steps (4) and (5), likely because xenstore gets polled somewhat slowly for this feature. Then he tried with an F18 kernel and found problems

(01:33:18 PM) vkuznets: with f18 (3.6.10-4.fc18.x86_64) the behavior is different: "umount /mnt/ && mount /dev/xvdk /mnt/ " just hanged
(01:33:37 PM) vkuznets: with nothing in dmesg
(01:35:07 PM) vkuznets: umount hanged
(01:35:25 PM) vkuznets: the volume was unmounted
(01:35:37 PM) vkuznets: and I can see /dev/xvdk present
(01:40:03 PM) vkuznets: re-checked after reboot, the behavior is the same - 'umount' hangs

So I'm removing the TestOnly keyword as it looks like we'll need to write a patch for upstream to get this working.

Comment 11 Radim Krčmář 2013-10-09 15:30:45 UTC

Amazon is going to ship the fix with next xen-3 host kernel.