Bug 794803

Summary: Confirm xen-blkfront deferred close works with RHEL5 xen tools
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Jones <drjones>
Component: kernelAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0CC: bfan, jingli, leiwang, mrezanin, qguan, qwan, vkuznets, wshi
Target Milestone: rcKeywords: EC2, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: xen
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 794806 (view as bug list) Environment:
Last Closed: 2013-10-09 15:30:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 794806    
Bug Blocks: 741684    

Description Andrew Jones 2012-02-17 17:20:58 UTC
If a host attempts to detach a block device from a guest then xen-blkfront, will say "no, in use", and then do some setup for a deferred detach. On unmount that detach will occur. With RHEL5 tools if this is done, then first the tools see the "no, in use" and give up. After the unmount the block device goes to Closed (state==6), viewable with 'xm block-list'. Now that the device is in this state it gets stuck. The host no longer can completely detach it, or reattach it. We may want to fix this in the xen tools (I'm cloning this bug to xen as well), but even if we do, then we need to make sure that when rhel7 guests are run on older hosts without the tools fix, that we don't cause problems, particularly problems that may lead to disk corruption. This bug will be TestOnly for starters.

Comment 1 Miroslav Rezanina 2012-07-03 13:52:12 UTC
With fix for 794806, device is properly removed from the guest without any freeze Unfortunately, it stays in xenstore (in state==6) so disk can't be re-attached. This require xend restart.

Comment 3 Andrew Jones 2013-04-19 11:42:51 UTC
Vitaly tested this over EC2 with the following steps

0) attach volume 1) mount volume 2) detach volume 3) see if volume still works 4) umount volume 5) see if you can mount it again - shouldn't be able to 6) attach volume 7) mount volume 8) confirm no data corruption

for a rhel6 guest he found that it worked, but a delay of several seconds was needed between steps (4) and (5), likely because xenstore gets polled somewhat slowly for this feature. Then he tried with an F18 kernel and found problems

(01:33:18 PM) vkuznets: with f18 (3.6.10-4.fc18.x86_64) the behavior is different: "umount /mnt/ && mount /dev/xvdk /mnt/ " just hanged
(01:33:37 PM) vkuznets: with nothing in dmesg
(01:35:07 PM) vkuznets: umount hanged
(01:35:25 PM) vkuznets: the volume was unmounted
(01:35:37 PM) vkuznets: and I can see /dev/xvdk present
(01:40:03 PM) vkuznets: re-checked after reboot, the behavior is the same - 'umount' hangs

So I'm removing the TestOnly keyword as it looks like we'll need to write a patch for upstream to get this working.

Comment 11 Radim Krčmář 2013-10-09 15:30:45 UTC
Amazon is going to ship the fix with next xen-3 host kernel.