Description of problem: Executing "xm block-detach" in Dom0 to attempt to detach the boot device of a guest DomU will not result in the boot device actually being detached from the guest, which is expected. However, after doing this, you can't shut down the guest domain (via "xm shutdown" or "virsh shutdown") or display the status of the guest domain (using "xm list"). The guest domain must be restarted from within the guest. Version-Release number of selected component (if applicable): xen-3.0.3-80.el5 kernel-xen-2.6.18-128.el5 How reproducible: Always Steps to Reproduce: 1. Create a guest domain 2. run "xm block-list <domain>" to determine the device ID 3. run "xm block-detach <domain> <device-ID>" 4. run "xm block-list <domain>" to verify that the device is still attached. 5. run "xm list". Note that <domain> is not in the list. 6. run "xm shutdown <domain>". Actual results: The domain is not listed as a result of the "xm list" command. The following error message will occur after "xm shutdown <domain>" is run: # xm block-list rhel47-test Vdev BE handle state evt-ch ring-ref BE-path 51712 0 0 4 6 8 /local/domain/0/backend/vbd/5/51712 # xm block-detach rhel47-test 51712 # xm block-list rhel47-test Vdev BE handle state evt-ch ring-ref BE-path 51712 0 0 4 6 8 /local/domain/0/backend/vbd/5/51712 # xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 702 1 r----- 1149.2 # xm shutdown rhel47-test Error: Device 51712 not connected Expected results: The boot device should not be detached. The domain should be listed in "xm list" and be shut down as a result of the "xm shutdown" command. Additional info: The domain is still shown in the results of "virsh list". Running "virsh shutdown" results in the following error: # virsh shutdown rhel47-test libvir: Xen Daemon error : internal error failed to parse Xend domain information error: failed to get domain 'rhel47-test' It is suspected that the fix to Bug 473882 may have introduced this issue.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
I've confirmed that removing xen-xm-block-detach.patch introduced by Bug 473882 resolves this issue. After installing xen and xen-libs packages built without the patch and restarting xend, I was able to shutdown a DomU that I had just run "xm block-detach" on to detach the boot device: [root@why xen]# rpm -qa | grep '^xen' xen-libs-3.0.3-80.el5.bz484110.1.x86_64 xen-3.0.3-80.el5.bz484110.1.x86_64 xen-libs-3.0.3-80.el5.bz484110.1.i386 [root@why xen]# xm block-list rhel52-test Vdev BE handle state evt-ch ring-ref BE-path 51712 0 0 4 6 8 /local/domain/0/backend/vbd/8/51712 [root@why xen]# xm block-detach rhel52-test 51712 [root@why xen]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 702 1 r----- 1277.4 rhel52-test 8 255 1 -b---- 15.0 [root@why xen]# xm shutdown rhel52-test [root@why xen]# xm console rhel52-test [...] Turning off swap: Turning off quotas: Unmounting pipe file systems: Halting system... md: stopping all md devices. System halted.
Well, of course removing that patch resolves the problem. Removing that patch removes all the code that actually detaches the block device!
Please re-try with xm block-detach --force ...rest of args... XenD often gets confused during block detach if the guest is not actively responding to the xenbus handshake required for clean detach. In those cases, force detach makes it ignore the guest. Of course XenD should also be resilient to non-responsive guests & that should be fixed. The 'xen-xm-block-detach.patch' is not relevent in this scenario. THis problem has existed ever since GA, though improved at very updates, there are still some problems outstanding.
Put in needinfo for question in comment #5. Also it sounds like we need to resolve if this is a regression or not.
(In reply to comment #5) > Please re-try with > > xm block-detach --force ...rest of args... That appears to have worked: [root@why ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 702 1 r----- 1492.5 rhel47-test 13 255 1 ------ 2.5 [root@why ~]# xm block-list rhel47-test Vdev BE handle state evt-ch ring-ref BE-path 51712 0 0 4 6 8 /local/domain/0/backend/vbd/13/51712 [root@why ~]# xm block-detach rhel47-test 51712 --force [root@why ~]# xm block-list rhel47-test [root@why ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 702 1 r----- 1493.4 rhel47-test 13 255 1 -b---- 6.2 [root@why ~]# xm shutdown rhel47-test; xm console rhel47-test xen_start_info @ffffffff806fa000 [...] Starting system message bus: [ OK ] Starting HAL daemon: [ OK ] INFO: rhel47-test-why.usersys.redhat.com updated Red Hat Enterprise Linux AS release 4 (Nahant Update 7) Kernel 2.6.9-78.ELxenU on an x86_64 rhel47-test-why.usersys.redhat.com login: vbd vbd-51712: 16 Device in use; refusing to close Red Hat Enterprise Linux AS release 4 (Nahant Update 7) Kernel 2.6.9-78.ELxenU on an x86_64 rhel47-test-why.usersys.redhat.com login: root And now the guest is hung . . . which I guess is what happens when you remove it's boot device. :^)
Well, it works only when you provide syntax like: # xm shutdown <domainId> -a which is not a good behaviour because this parameter should not be necessary. Anyway not showing in the running VM's list should be a problem connected to this bug - maybe domain is not seen by shutdown command because of Xen doesn't know about that domain. Detaching a device that is a boot device of VM should not be possible so I'll create a patch not to be able to disconnect boot device of VM. Alternatively detaching the boot device should result in automatic domain shutdown but I think this solution is not clear like disallowing device detach.
Created attachment 332942 [details] Fix block-detach behaviour I have created a patch concerning domain shutdown issue after trying to block-detach the guest boot device as described in BZ #484110. There was a code for removing device in the context of destroy device deleted because it's deleted at another location too and this modification made it working right. I tried it before that modification and `xm block-list` showed a 2 devices there. I tried to detach boot device boot device and it wasn't working and device was still there. `xm list` command showed me that there is still my test domain so I tried to detach second device. This worked fine and the device got detached. `xm list` output was still showing the domain... Finally, I run `xm shutdown` on that domain and no problem reported and domain started shutting down and finally disappeared from `xm list` and was not running when looking to virt-manager as well. Virsh also showed me these expected results and everything is working fine.
I've been able to duplicate the results reported in Comment #13. I'm continuing to run the modified xen packages and will report if I see any regressive behavior.
Created attachment 333145 [details] Corrected fix for block-detach operation This is an updated (corrected) version of block-detach fix. A new function "getBootDevice()" has been added in XendDomainInfo to get boot device of desired domain. It's used in destroyDevice() routine to check whether we're not trying to unmount boot device. I've tested it on RHEL 5.3's Xen using following syntax: 1) xm create DOMAIN 2) xm list 3) xm block-list DOMAIN 4) xm block-detach DOMAIN DEVICE 5) xm list ==> domain is still there, it wasn't without this patch 6) xm block-attach DOMAIN BACKEND FRONTEND MODE (using the same FRONTEND like we detached in step 4) 7) xm list ==> still showing correct results 8) xm shutdown DOMAIN ==> DOMAIN started shutting down I also tried mounting and unmounting devices in guest machine after attach/detach operations and it was working fine too. Force option has been preserved and it can detach boot device too but if not used the error message (VmError): "Cannot unmount domain DOMAIN_ID boot device (DEV_ENTRY)" is returned. The getBootDevice() function is defined to get the first device (which, from what I have been told and also from what I have tested, is used as a boot device for this domain (it can be either VBD or TAP device).
Created attachment 334121 [details] Detach fix for mounted guest block devices Well, another update, the problem was not only when detaching guest devices but also when detaching any mounted device. This patch solves this issue in general for any device mounted to guest.
*** Bug 476164 has been marked as a duplicate of this bug. ***
*** Bug 476093 has been marked as a duplicate of this bug. ***
*** Bug 475789 has been marked as a duplicate of this bug. ***
Comment on attachment 334121 [details] Detach fix for mounted guest block devices The fix to xen-hotplug-cleanup is not robust enough causing the whole script to fail in some conditions
Created attachment 334942 [details] Fixed patch to fix this bug More robust fix to xen-hotplug-cleanup script
Comment on attachment 334942 [details] Fixed patch to fix this bug This patch removes /vm/UUID/... path only when --force is used for detaching. Otherwise, the path will be removed by a new code added to xen-hoplug-cleanup script. This way, it only gets removed when the device has really been detached. BTW, both backend and frontend paths are handled in exactly the same way. This new version of the patch makes xen-hotplug-cleanup more robust. Thanks to "trap sigerr ERR" in xen-hotplug-common.sh the xen-hotplug-cleanup from the previous version of this patch would exit when reading /local/domain/ID/vm fails thus skipping all the xenstore-rm lines in the rest of the script. The original patch was accepted by upstream as c/s 19250 and reverted by c/s 19314. This new version has been accepted as c/s 19342.
*** Bug 476656 has been marked as a duplicate of this bug. ***
*** Bug 479215 has been marked as a duplicate of this bug. ***
Fix built into xen-3.0.3-86.el5
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1328.html