Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1150505 - Domain is out of control from libvirt when running some concurrent define/undefine/start/destroy jobs rapidly
Domain is out of control from libvirt when running some concurrent define/und...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.1
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Martin Kletzander
Virtualization Bugs
: Upstream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-08 07:37 EDT by Hu Jianwei
Modified: 2015-03-05 02:46 EST (History)
10 users (show)

See Also:
Fixed In Version: libvirt-1.2.8-11.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-03-05 02:46:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Error log for scratch build (13.64 MB, text/plain)
2014-10-15 04:03 EDT, Hu Jianwei
no flags Details
log for libvirtd on 1.2.8-7 build (17.76 MB, text/plain)
2014-11-24 21:50 EST, Hu Jianwei
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0323 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2015-03-05 07:10:54 EST

  None (edit)
Description Hu Jianwei 2014-10-08 07:37:59 EDT
Description
Domain is out of control from libvirt when running some concurrent define/undefine/start/destroy jobs rapidly

Version:
libvirt-1.2.8-4.el7.x86_64
qemu-kvm-1.5.3-73.el7.x86_64
kernel-3.10.0-123.el7.x86_64
libcgroup-0.41-6.el7.x86_64
libcgroup-tools-0.41-6.el7.x86_64

How reproducible:
95%

Steps to Reproduce:
1. In the first terminal:
[root@ibm-x3850x5-06 ~]# while true; do virsh undefine test1;virsh define test1.xml; done

2. In the second terminal:
[root@ibm-x3850x5-06 libvirt-1.2.8]# while true; do virsh destroy test1;virsh start test1; done

3. After the rapid stress scripts:
[root@ibm-x3850x5-06 machine.slice]# ps aux | grep test1
qemu       748 46.4  0.8 1639612 282100 ?      Sl   16:25   0:31 /usr/libexec/qemu-kvm -name test1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 4309adb4-30f0-4f23-9109-a3a2c3877868 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:46:9d:f0,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charc!
 hannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

[root@ibm-x3850x5-06 libvirt-1.2.8]# virsh start test1
error: Failed to start domain test1
error: error from service: CreateMachine: File exists

[root@ibm-x3850x5-06 machine.slice]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     test1                          shut off

[root@ibm-x3850x5-06 machine.slice]# pwd
/sys/fs/cgroup/systemd/machine.slice
[root@ibm-x3850x5-06 machine.slice]# ll
total 0
-rw-r--r--. 1 root root 0 Sep 29 21:06 cgroup.clone_children
--w--w--w-. 1 root root 0 Sep 29 21:06 cgroup.event_control
-rw-r--r--. 1 root root 0 Sep 29 21:06 cgroup.procs
drwxr-xr-x. 2 root root 0 Sep 30 15:35 machine-qemu\x2dtest1.scope
-rw-r--r--. 1 root root 0 Sep 29 21:06 notify_on_release
-rw-r--r--. 1 root root 0 Sep 29 21:06 tasks

Actual results:
As above shown steps, the domain's qemu process was left and detached from libvirt, libvirt can not start it anymore.

2014-09-30 07:22:41.715+0000: 2757: debug : virEventPollDispatchHandles:494 : i=0 w=1
2014-09-30 07:22:41.715+0000: 2761: error : virDBusCall:1429 : error from service: CreateMachine: File exists

Expected results:
libvirt should start the domain.

Additional info:
Comment 1 Daniel Berrange 2014-10-08 07:43:02 EDT
Probably need this upstream commit

commit 4882618ed13b469d92fa8b2b4a158fdb17dbe9f1
Author: Guido Günther <agx@sigxcpu.org>
Date:   Thu Sep 25 13:32:58 2014 +0200

    qemu: use systemd's TerminateMachine to kill all processes
    
    If we don't properly clean up all processes in the
    machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm
    starts fail with
    
      'CreateMachine: File exists'
    
    Additional processes can e.g. be added via
    
      echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks
    
    but there are other cases like
    
      http://bugs.debian.org/761521
    
    Invoke TerminateMachine to be on the safe side since systemd tracks the
    cgroup anyway. This is a noop if all processes have terminated already.
Comment 5 Martin Kletzander 2014-10-15 03:25:44 EDT
Please provide debug logs from libvirt while reproducing the issue? Thank you.
Comment 6 Hu Jianwei 2014-10-15 04:03:56 EDT
Created attachment 947129 [details]
Error log for scratch build

Please check the error log for scratch build
Comment 7 Martin Kletzander 2014-11-04 04:58:41 EST
Fixed upstream with v1.2.10-9-gb629c64:

commit b629c64e5e0a32ef439b8eeb3a697e2cd76f3248
Author:     Martin Kletzander <mkletzan@redhat.com>
AuthorDate: Thu Oct 30 14:38:35 2014 +0100

    qemu: avoid rare race when undefining domain
Comment 10 Hu Jianwei 2014-11-24 21:46:52 EST
Still can reproduce it.

[root@ibm-x3850x5-06 ~]# rpm -q libvirt
libvirt-1.2.8-7.el7.x86_64

After do concurrent jobs rapidly.

[root@ibm-x3850x5-06 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     test                           shut off

[root@ibm-x3850x5-06 ~]# virsh start test
error: Failed to start domain test
error: error from service: CreateMachine: File exists

[root@ibm-x3850x5-06 ~]# ps aux | grep qemu-kvm
qemu       377  7.1  0.8 1661472 290980 ?      Sl   10:34   0:38 /usr/libexec/qemu-kvm -name test -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2ce8d663-981e-416e-8760-a21216481992 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9d:96:2a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
root       858  0.0  0.0 112644   972 pts/0    S+   10:43   0:00 grep --color=auto qemu-kvm
Comment 11 Hu Jianwei 2014-11-24 21:50:35 EST
Created attachment 960995 [details]
log for libvirtd on 1.2.8-7 build
Comment 12 Martin Kletzander 2014-12-08 08:06:27 EST
I need to investigate more if this is still not fixed.  Moving back to assigned.
Comment 14 vivian zhang 2014-12-23 03:08:21 EST
I can produce this bug on build
libvirt-1.2.8-10.el7.x86_64

verify it on build
libvirt-1.2.8-11.el7.x86_64

verify steps:

1. prepare a guest xml in the host 
In the first terminal:
#while true; do virsh undefine vm1;virsh define vm1.xml; done

In the second terminal:
# while true;do virsh destroy vm1;virsh start vm1;done


2. execute the stress scripts test more than 2 hours, guest still works normally,
no qemu-kvm process exists always

 # virsh start vm1
Domain vm1 started

[root@intel-e31225-16-2 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 12824 vm1                            running


move to verified
Comment 16 errata-xmlrpc 2015-03-05 02:46:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html

Note You need to log in before you can comment on or make changes to this bug.