Red Hat Bugzilla – Bug 794803
Confirm xen-blkfront deferred close works with RHEL5 xen tools
Last modified: 2013-10-09 11:30:45 EDT
If a host attempts to detach a block device from a guest then xen-blkfront, will say "no, in use", and then do some setup for a deferred detach. On unmount that detach will occur. With RHEL5 tools if this is done, then first the tools see the "no, in use" and give up. After the unmount the block device goes to Closed (state==6), viewable with 'xm block-list'. Now that the device is in this state it gets stuck. The host no longer can completely detach it, or reattach it. We may want to fix this in the xen tools (I'm cloning this bug to xen as well), but even if we do, then we need to make sure that when rhel7 guests are run on older hosts without the tools fix, that we don't cause problems, particularly problems that may lead to disk corruption. This bug will be TestOnly for starters.
With fix for 794806, device is properly removed from the guest without any freeze Unfortunately, it stays in xenstore (in state==6) so disk can't be re-attached. This require xend restart.
Vitaly tested this over EC2 with the following steps
0) attach volume 1) mount volume 2) detach volume 3) see if volume still works 4) umount volume 5) see if you can mount it again - shouldn't be able to 6) attach volume 7) mount volume 8) confirm no data corruption
for a rhel6 guest he found that it worked, but a delay of several seconds was needed between steps (4) and (5), likely because xenstore gets polled somewhat slowly for this feature. Then he tried with an F18 kernel and found problems
(01:33:18 PM) vkuznets: with f18 (3.6.10-4.fc18.x86_64) the behavior is different: "umount /mnt/ && mount /dev/xvdk /mnt/ " just hanged
(01:33:37 PM) vkuznets: with nothing in dmesg
(01:35:07 PM) vkuznets: umount hanged
(01:35:25 PM) vkuznets: the volume was unmounted
(01:35:37 PM) vkuznets: and I can see /dev/xvdk present
(01:40:03 PM) vkuznets: re-checked after reboot, the behavior is the same - 'umount' hangs
So I'm removing the TestOnly keyword as it looks like we'll need to write a patch for upstream to get this working.
Amazon is going to ship the fix with next xen-3 host kernel.