Red Hat Bugzilla – Bug 1150505
Domain is out of control from libvirt when running some concurrent define/undefine/start/destroy jobs rapidly
Last modified: 2015-03-05 02:46:15 EST
Description Domain is out of control from libvirt when running some concurrent define/undefine/start/destroy jobs rapidly Version: libvirt-1.2.8-4.el7.x86_64 qemu-kvm-1.5.3-73.el7.x86_64 kernel-3.10.0-123.el7.x86_64 libcgroup-0.41-6.el7.x86_64 libcgroup-tools-0.41-6.el7.x86_64 How reproducible: 95% Steps to Reproduce: 1. In the first terminal: [root@ibm-x3850x5-06 ~]# while true; do virsh undefine test1;virsh define test1.xml; done 2. In the second terminal: [root@ibm-x3850x5-06 libvirt-1.2.8]# while true; do virsh destroy test1;virsh start test1; done 3. After the rapid stress scripts: [root@ibm-x3850x5-06 machine.slice]# ps aux | grep test1 qemu 748 46.4 0.8 1639612 282100 ? Sl 16:25 0:31 /usr/libexec/qemu-kvm -name test1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 4309adb4-30f0-4f23-9109-a3a2c3877868 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:46:9d:f0,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charc! hannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on [root@ibm-x3850x5-06 libvirt-1.2.8]# virsh start test1 error: Failed to start domain test1 error: error from service: CreateMachine: File exists [root@ibm-x3850x5-06 machine.slice]# virsh list --all Id Name State ---------------------------------------------------- - test1 shut off [root@ibm-x3850x5-06 machine.slice]# pwd /sys/fs/cgroup/systemd/machine.slice [root@ibm-x3850x5-06 machine.slice]# ll total 0 -rw-r--r--. 1 root root 0 Sep 29 21:06 cgroup.clone_children --w--w--w-. 1 root root 0 Sep 29 21:06 cgroup.event_control -rw-r--r--. 1 root root 0 Sep 29 21:06 cgroup.procs drwxr-xr-x. 2 root root 0 Sep 30 15:35 machine-qemu\x2dtest1.scope -rw-r--r--. 1 root root 0 Sep 29 21:06 notify_on_release -rw-r--r--. 1 root root 0 Sep 29 21:06 tasks Actual results: As above shown steps, the domain's qemu process was left and detached from libvirt, libvirt can not start it anymore. 2014-09-30 07:22:41.715+0000: 2757: debug : virEventPollDispatchHandles:494 : i=0 w=1 2014-09-30 07:22:41.715+0000: 2761: error : virDBusCall:1429 : error from service: CreateMachine: File exists Expected results: libvirt should start the domain. Additional info:
Probably need this upstream commit commit 4882618ed13b469d92fa8b2b4a158fdb17dbe9f1 Author: Guido Günther <agx@sigxcpu.org> Date: Thu Sep 25 13:32:58 2014 +0200 qemu: use systemd's TerminateMachine to kill all processes If we don't properly clean up all processes in the machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm starts fail with 'CreateMachine: File exists' Additional processes can e.g. be added via echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks but there are other cases like http://bugs.debian.org/761521 Invoke TerminateMachine to be on the safe side since systemd tracks the cgroup anyway. This is a noop if all processes have terminated already.
Please provide debug logs from libvirt while reproducing the issue? Thank you.
Created attachment 947129 [details] Error log for scratch build Please check the error log for scratch build
Fixed upstream with v1.2.10-9-gb629c64: commit b629c64e5e0a32ef439b8eeb3a697e2cd76f3248 Author: Martin Kletzander <mkletzan@redhat.com> AuthorDate: Thu Oct 30 14:38:35 2014 +0100 qemu: avoid rare race when undefining domain
Still can reproduce it. [root@ibm-x3850x5-06 ~]# rpm -q libvirt libvirt-1.2.8-7.el7.x86_64 After do concurrent jobs rapidly. [root@ibm-x3850x5-06 ~]# virsh list --all Id Name State ---------------------------------------------------- - test shut off [root@ibm-x3850x5-06 ~]# virsh start test error: Failed to start domain test error: error from service: CreateMachine: File exists [root@ibm-x3850x5-06 ~]# ps aux | grep qemu-kvm qemu 377 7.1 0.8 1661472 290980 ? Sl 10:34 0:38 /usr/libexec/qemu-kvm -name test -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2ce8d663-981e-416e-8760-a21216481992 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9d:96:2a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on root 858 0.0 0.0 112644 972 pts/0 S+ 10:43 0:00 grep --color=auto qemu-kvm
Created attachment 960995 [details] log for libvirtd on 1.2.8-7 build
I need to investigate more if this is still not fixed. Moving back to assigned.
I can produce this bug on build libvirt-1.2.8-10.el7.x86_64 verify it on build libvirt-1.2.8-11.el7.x86_64 verify steps: 1. prepare a guest xml in the host In the first terminal: #while true; do virsh undefine vm1;virsh define vm1.xml; done In the second terminal: # while true;do virsh destroy vm1;virsh start vm1;done 2. execute the stress scripts test more than 2 hours, guest still works normally, no qemu-kvm process exists always # virsh start vm1 Domain vm1 started [root@intel-e31225-16-2 ~]# virsh list Id Name State ---------------------------------------------------- 12824 vm1 running move to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html