Bug 484110 - Unable to shut down guest after attempting block-detach of guest boot device
Unable to shut down guest after attempting block-detach of guest boot device
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Jiri Denemark
Virtualization Bugs
:
: 475789 476093 476164 476656 479215 (view as bug list)
Depends On:
Blocks: 486157
  Show dependency treegraph
 
Reported: 2009-02-04 15:40 EST by Bryan Mason
Modified: 2010-10-23 03:30 EDT (History)
8 users (show)

See Also:
Fixed In Version: xen-3.0.3-86.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 06:07:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix block-detach behaviour (775 bytes, patch)
2009-02-23 11:20 EST, Michal Novotny
no flags Details | Diff
Corrected fix for block-detach operation (2.35 KB, patch)
2009-02-25 04:54 EST, Michal Novotny
no flags Details | Diff
Detach fix for mounted guest block devices (1.86 KB, patch)
2009-03-05 06:13 EST, Michal Novotny
no flags Details | Diff
Fixed patch to fix this bug (1.95 KB, patch)
2009-03-12 11:06 EDT, Jiri Denemark
no flags Details | Diff

  None (edit)
Description Bryan Mason 2009-02-04 15:40:10 EST
Description of problem:

    Executing "xm block-detach" in Dom0 to attempt to detach the boot
    device of a guest DomU will not result in the boot device actually
    being detached from the guest, which is expected.  However, after
    doing this, you can't shut down the guest domain (via "xm
    shutdown" or "virsh shutdown") or display the status of the guest
    domain (using "xm list").  The guest domain must be restarted from
    within the guest.
    
Version-Release number of selected component (if applicable):

    xen-3.0.3-80.el5
    kernel-xen-2.6.18-128.el5

How reproducible:

    Always

Steps to Reproduce:

    1. Create a guest domain
    2. run "xm block-list <domain>" to determine the device ID
    3. run "xm block-detach <domain> <device-ID>"
    4. run "xm block-list <domain>" to verify that the device is still attached.
    5. run "xm list".  Note that <domain> is not in the list.
    6. run "xm shutdown <domain>".
  
Actual results:

    The domain is not listed as a result of the "xm list" command.
    The following error message will occur after "xm shutdown <domain>" is run:

        # xm block-list rhel47-test
        Vdev  BE handle state evt-ch ring-ref BE-path
        51712    0    0     4      6      8     /local/domain/0/backend/vbd/5/51712  

        # xm block-detach rhel47-test 51712
        # xm block-list rhel47-test
        Vdev  BE handle state evt-ch ring-ref BE-path
        51712  0      0     4      6        8 /local/domain/0/backend/vbd/5/51712  
        # xm list 
        Name                                  ID Mem(MiB) VCPUs State   Time(s)
        Domain-0                               0      702     1 r-----   1149.2

        # xm shutdown rhel47-test
        Error: Device 51712 not connected

Expected results:

    The boot device should not be detached.  The domain should be listed in 
    "xm list" and be shut down as a result of the "xm shutdown" command.

Additional info:

    The domain is still shown in the results of "virsh list".  Running
    "virsh shutdown" results in the following error:

        # virsh shutdown rhel47-test
        libvir: Xen Daemon error : internal error failed to parse Xend domain 
        information
        error: failed to get domain 'rhel47-test'

    It is suspected that the fix to Bug 473882 may have introduced this issue.
Comment 2 RHEL Product and Program Management 2009-02-04 15:47:56 EST
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 3 Bryan Mason 2009-02-04 18:07:31 EST
I've confirmed that removing xen-xm-block-detach.patch introduced by Bug 473882 resolves this issue.  After installing xen and xen-libs packages built without the patch and restarting xend, I was able to shutdown a DomU that I had just run "xm block-detach" on to detach the boot device:

[root@why xen]# rpm -qa | grep '^xen'
xen-libs-3.0.3-80.el5.bz484110.1.x86_64
xen-3.0.3-80.el5.bz484110.1.x86_64
xen-libs-3.0.3-80.el5.bz484110.1.i386

[root@why xen]# xm block-list rhel52-test
Vdev  BE handle state evt-ch ring-ref BE-path
51712    0    0     4      6      8     /local/domain/0/backend/vbd/8/51712

[root@why xen]# xm block-detach rhel52-test 51712
[root@why xen]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      702     1 r-----   1277.4
rhel52-test                                8      255     1 -b----     15.0

[root@why xen]# xm shutdown rhel52-test

[root@why xen]# xm console rhel52-test
[...]
Turning off swap:  
Turning off quotas:  
Unmounting pipe file systems:  
Halting system...
md: stopping all md devices.
System halted.
Comment 4 Bryan Mason 2009-02-04 18:47:25 EST
Well, of course removing that patch resolves the problem.  Removing that patch removes all the code that actually detaches the block device!
Comment 5 Daniel Berrange 2009-02-05 06:31:08 EST
Please re-try with 

  xm block-detach --force  ...rest of args...

XenD often gets confused during block detach if the guest is not actively responding to the xenbus handshake required for clean detach. In those cases, force detach makes it ignore the guest.

Of course XenD should also be resilient to non-responsive guests & that should be fixed.  The 'xen-xm-block-detach.patch' is not relevent in this scenario. THis problem has existed ever since GA, though improved at very updates, there are still some problems outstanding.
Comment 6 Bill Burns 2009-02-05 09:18:08 EST
Put in needinfo for question in comment #5.
Also it sounds like we need to resolve if this is a regression or not.
Comment 7 Bryan Mason 2009-02-05 17:38:22 EST
(In reply to comment #5)
> Please re-try with 
> 
>   xm block-detach --force  ...rest of args...

That appears to have worked:

  [root@why ~]# xm list
  Name                                  ID Mem(MiB) VCPUs State   Time(s)
  Domain-0                               0      702     1 r-----   1492.5
  rhel47-test                           13      255     1 ------      2.5

  [root@why ~]# xm block-list rhel47-test
  Vdev  BE handle state evt-ch ring-ref BE-path
  51712    0    0     4      6      8     /local/domain/0/backend/vbd/13/51712  

  [root@why ~]# xm block-detach rhel47-test 51712 --force
  [root@why ~]# xm block-list rhel47-test
  [root@why ~]# xm list
  Name                                      ID Mem(MiB) VCPUs State   Time(s)
  Domain-0                                   0      702     1 r-----   1493.4
  rhel47-test                               13      255     1 -b----      6.2

  [root@why ~]# xm shutdown rhel47-test; xm console rhel47-test
  xen_start_info @ffffffff806fa000
  [...]
  Starting system message bus: [  OK  ]
  Starting HAL daemon: [  OK  ]
  INFO: rhel47-test-why.usersys.redhat.com updated

  Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
  Kernel 2.6.9-78.ELxenU on an x86_64

  rhel47-test-why.usersys.redhat.com login: vbd vbd-51712: 16 Device in use;
  refusing to close

  Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
  Kernel 2.6.9-78.ELxenU on an x86_64

  rhel47-test-why.usersys.redhat.com login: root

And now the guest is hung . . . which I guess is what happens when you remove it's boot device.  :^)
Comment 12 Michal Novotny 2009-02-23 04:04:26 EST
Well, it works only when you provide syntax like:

# xm shutdown <domainId> -a

which is not a good behaviour because this parameter should not be necessary. Anyway not showing in the running VM's list should be a problem connected to this bug - maybe domain is not seen by shutdown command because of Xen doesn't know about that domain. Detaching a device that is a boot device of VM should not be possible so I'll create a patch not to be able to disconnect boot device of VM. Alternatively detaching the boot device should result in automatic domain shutdown but I think this solution is not clear like disallowing device detach.
Comment 13 Michal Novotny 2009-02-23 11:20:29 EST
Created attachment 332942 [details]
Fix block-detach behaviour

I have created a patch concerning domain shutdown issue after trying to block-detach the guest boot device as described in BZ #484110. There was a code for removing device in the context of destroy device deleted because it's deleted at another location too and this modification made it working right.

I tried it before that modification and `xm block-list` showed a 2 devices there. I tried to detach boot device boot device and it wasn't working and device was still there.  `xm list` command showed me that there is still my test domain so I tried to detach second device. This worked fine and the device got detached. `xm list` output was still showing the domain... Finally, I run `xm shutdown` on that domain and no problem reported and domain started shutting down and finally disappeared from `xm list` and was not running when looking to virt-manager as well. Virsh also showed me these expected results and everything is working fine.
Comment 14 Bryan Mason 2009-02-23 15:58:54 EST
I've been able to duplicate the results reported in Comment #13.  I'm continuing to run the modified xen packages and will report if I see any regressive behavior.
Comment 15 Michal Novotny 2009-02-25 04:54:00 EST
Created attachment 333145 [details]
Corrected fix for block-detach operation

This is an updated (corrected) version of block-detach fix. A new function "getBootDevice()" has been added in XendDomainInfo to get boot device of desired domain. It's used in destroyDevice() routine to check whether we're not trying to unmount boot device. I've tested it on RHEL 5.3's Xen using following syntax:

1) xm create DOMAIN
2) xm list
3) xm block-list DOMAIN
4) xm block-detach DOMAIN DEVICE
5) xm list ==> domain is still there, it wasn't without this patch
6) xm block-attach DOMAIN BACKEND FRONTEND MODE (using the same FRONTEND like we detached in step 4)
7) xm list ==> still showing correct results
8) xm shutdown DOMAIN ==> DOMAIN started shutting down

I also tried mounting and unmounting devices in guest machine after attach/detach operations and it was working fine too. Force option has been preserved and it can detach boot device too but if not used the error message (VmError): "Cannot unmount domain DOMAIN_ID boot device (DEV_ENTRY)" is returned. The getBootDevice() function is defined to get the first device (which, from what I have been told and also from what I have tested, is used  as a boot device for this domain (it can be either VBD or TAP device).
Comment 17 Michal Novotny 2009-03-05 06:13:40 EST
Created attachment 334121 [details]
Detach fix for mounted guest block devices

Well, another update, the problem was not only when detaching guest devices but also when detaching any mounted device. This patch solves this issue in general for any device mounted to guest.
Comment 18 Michal Novotny 2009-03-05 06:18:03 EST
*** Bug 476164 has been marked as a duplicate of this bug. ***
Comment 19 Michal Novotny 2009-03-05 07:35:33 EST
*** Bug 476093 has been marked as a duplicate of this bug. ***
Comment 20 Michal Novotny 2009-03-11 07:39:18 EDT
*** Bug 475789 has been marked as a duplicate of this bug. ***
Comment 21 Jiri Denemark 2009-03-12 05:24:22 EDT
Comment on attachment 334121 [details]
Detach fix for mounted guest block devices

The fix to xen-hotplug-cleanup is not robust enough causing the whole script to fail in some conditions
Comment 23 Jiri Denemark 2009-03-12 11:06:14 EDT
Created attachment 334942 [details]
Fixed patch to fix this bug

More robust fix to xen-hotplug-cleanup script
Comment 24 Jiri Denemark 2009-03-13 06:31:53 EDT
Comment on attachment 334942 [details]
Fixed patch to fix this bug

This patch removes /vm/UUID/... path only when --force is used for detaching. Otherwise, the path will be removed by a new code added to xen-hoplug-cleanup script. This way, it only gets removed when the device has really been detached. BTW, both backend and frontend paths are handled in exactly the same way.

This new version of the patch makes xen-hotplug-cleanup more robust. Thanks to "trap sigerr ERR" in xen-hotplug-common.sh the xen-hotplug-cleanup from the previous version of this patch would exit when reading /local/domain/ID/vm fails thus skipping all the xenstore-rm lines in the rest of the script.

The original patch was accepted by upstream as c/s 19250 and reverted by c/s 19314. This new version has been accepted as c/s 19342.
Comment 25 Jiri Denemark 2009-04-03 04:25:12 EDT
*** Bug 476656 has been marked as a duplicate of this bug. ***
Comment 26 Jiri Denemark 2009-04-27 10:53:43 EDT
*** Bug 479215 has been marked as a duplicate of this bug. ***
Comment 27 Jiri Denemark 2009-05-21 11:40:26 EDT
Fix built into xen-3.0.3-86.el5
Comment 31 Chris Ward 2009-07-03 14:23:31 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 34 errata-xmlrpc 2009-09-02 06:07:41 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1328.html

Note You need to log in before you can comment on or make changes to this bug.