Bug 500158

Summary: 'virsh destroy' destroys multiple VMs
Product: Red Hat Enterprise Linux 5 Reporter: Daniel Berrangé <berrange>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: berrange, clalance, crobinso, itamar, jlaska, lili, markmc, matt, nzhang, veillard, virt-maint, virt-maint, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 499698 Environment:
Last Closed: 2009-09-02 09:20:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 499698    
Bug Blocks:    

Description Daniel Berrangé 2009-05-11 13:09:12 UTC
+++ This bug was initially created as a clone of Bug #499698 +++

Sometimes, using 'virsh destroy' or the 'Force off' button in virt-manager will cause multiple running VMs to be destroyed. Here's an example:

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
  4 Fedora_10_clone      running
  5 F10_2                running
  6 F10                  running

[wwoods@metroid ~]$ sudo virsh destroy F10_2
Domain F10_2 destroyed

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
  4 Fedora_10_clone      running
  - F10                  shut off
  - F10_2                shut off

Note that 'F10' is now also shut off, even though I didn't destroy it. This doesn't seem to happen every time, and it doesn't seem to be related to the names of the hosts being similar:

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
  4 Fedora_10_clone      running
  8 F10_2                running
  - F10                  shut off

[wwoods@metroid ~]$ sudo virsh start F10
Domain F10 started

[wwoods@metroid ~]$ sudo virsh destroy Fedora_10_clone
Domain Fedora_10_clone destroyed

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
 10 F10                  running
  - F10_2                shut off
  - Fedora_10_clone      shut off

There are no relevant messages in syslog, other than the expected ones (e.g. "kernel: virbr0: port 3(vnet2) entering disabled state" as the host comes down).

--- Additional comment from berrange on 2009-05-07 14:01:06 EDT ---

Can you run 'strace -f -p $PID-OF-LIBVIRTD -s 1000 -ff -o'   and then try and reproduce the destroy problem. 

Also, can you turn on full debug logging of libvirtd & capture the results http://libvirt.org/logging.html

--- Additional comment from markmc on 2009-05-07 14:04:14 EDT ---

Nasty. I've asked wwoods for a libvirtd log ala:

https://fedoraproject.org/wiki/Reporting_virtualization_bugs#libvirt

--- Additional comment from wwoods on 2009-05-07 14:29:41 EDT ---

Created an attachment (id=342908)
very verbose log from libvirtd

This is the full log from libvirtd with log_level set to 1. It follows these basic steps:

service libvirtd restart 
virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
 13 F10                  running
  - F10_2                shut off
  - F10_RAID             shut off
  - F9                   shut off
  - Fedora_10_clone      shut off
  - Rawhide              shut off

virsh start F10_2
virsh start Fedora_10_clone
virsh list --all
virsh destroy F10_2
virsh list --all
virsh start F10_2
virsh destroy F10
virsh list --all
virsh start F10
virsh list --all
virsh list --all
virsh start F10_2
virsh list --all
# problem is triggered here - F10 dies as well
virsh destroy F10_2
virsh list --all
 Id Name                 State
----------------------------------
  1 Ubuntu_Jaunty        running
 15 Fedora_10_clone      running
  - F10                  shut off
  - F10_2                shut off
  - F10_RAID             shut off
  - F9                   shut off
  - Rawhide              shut off

service libvirtd stop

Hope you can make some sense of it - it's 4MB uncompressed.

--- Additional comment from markmc on 2009-05-07 14:50:37 EDT ---

Interesting, you only issued a destroy for F10_2, but yet:

14:12:43.927: debug : virDomainDestroy:1750 : domain=0x7fb270001270
14:12:43.927: debug : qemudShutdownVMDaemon:1518 : Shutting down VM 'F10_2'

14:12:43.927: debug : virEventRemoveHandleImpl:165 : Remove handle 20
14:12:43.927: debug : virEventRemoveHandleImpl:172 : mark delete 11 24
14:12:43.983: debug : virEventRunOnce:544 : Poll got 1 event
14:12:43.983: debug : virEventDispatchHandles:416 : Skip deleted 24
14:12:43.983: debug : virEventDispatchHandles:425 : Dispatch 24 32 0x13f7c00
14:12:44.036: info : Setting SELinux context on '/var/lib/libvirt/images/F10_2.img' to 'system_u:object_r:virt_image_t:s0'
14:12:44.036: debug : virEventUpdateTimeoutImpl:233 : Updating timer 0 timeout with 0 ms freq
14:12:44.036: debug : qemudShutdownVMDaemon:1518 : Shutting down VM 'F10'

--- Additional comment from berrange on 2009-05-07 16:11:32 EDT ---

This is an event loop dispatcher bug. The destroy call is killing the QEMU process, so we then get a HANGUP event on the FD associated with the guest monitor. The callback for this is already marked as deleted in the event loop though, so it gets skiped, and then we mistakenly dispatch the next callback in the loop, causing us to think another VM has died, and trigger cleanup of that guest

--- Additional comment from berrange on 2009-05-08 05:43:39 EDT ---

*** Bug 499788 has been marked as a duplicate of this bug. ***

--- Additional comment from berrange on 2009-05-08 11:19:24 EDT ---

Created an attachment (id=343109)
Fix event loop handling of deletes & test functionality

THis patch fixes the event loop handling of deletes and adds a test case which validates the various important scenarios actually work

--- Additional comment from berrange on 2009-05-11 08:48:48 EDT ---

*** Bug 500089 has been marked as a duplicate of this bug. ***

Comment 1 Daniel Berrangé 2009-05-11 13:09:32 UTC
*** Bug 500089 has been marked as a duplicate of this bug. ***

Comment 2 Daniel Veillard 2009-05-14 15:24:29 UTC
libvirt-0.6.3-3.el5 has been built in dist-5E-qu-candidate with the fix,

Daniel

Comment 4 Matthew Farrellee 2009-05-27 19:43:23 UTC
Preliminary tests with 0.6.3-3.el5 from ovirt-latest suggest BZ500089 was fixed as well.

Comment 5 Nan Zhang 2009-06-02 07:02:59 UTC
Start up all guest domains.

[root@dhcp-66-70-85 ~]# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
 13 foo1                 idle
 14 foo2                 idle
 15 foo3                 idle
 16 test1                idle
 17 test2                idle
 18 test3                idle

[root@dhcp-66-70-85 ~]# virsh destroy foo1
Domain foo1 destroyed

[root@dhcp-66-70-85 ~]# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
 14 foo2                 idle
 15 foo3                 idle
 16 test1                idle
 17 test2                idle
 18 test3                idle
  - foo1                 shut off

[root@dhcp-66-70-85 ~]# virsh destroy foo2
Domain foo2 destroyed

[root@dhcp-66-70-85 ~]# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
 15 foo3                 idle
 16 test1                idle
 17 test2                idle
 18 test3                idle
  - foo1                 shut off
  - foo2                 shut off

[root@dhcp-66-70-85 ~]# virsh destroy test2
Domain test2 destroyed

[root@dhcp-66-70-85 ~]# virsh destroy test3
Domain test3 destroyed

[root@dhcp-66-70-85 ~]# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
 15 foo3                 idle
 16 test1                idle
  - foo1                 shut off
  - foo2                 shut off
  - test2                shut off
  - test3                shut off

[root@dhcp-66-70-85 ~]# virsh start foo1
Domain foo1 started

[root@dhcp-66-70-85 ~]# virsh destroy foo3
Domain foo3 destroyed

[root@dhcp-66-70-85 ~]# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
 16 test1                idle
 19 foo1                 idle
  - foo2                 shut off
  - foo3                 shut off
  - test2                shut off
  - test3                shut off

This bug has been verified in libvirt 0.6.3-3 on rhel-5.4. Fixed.

Comment 7 errata-xmlrpc 2009-09-02 09:20:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1269.html