Description of problem: Sometimes after resuming a VM, the VM is unusable. - The qemu-kvm process consumes 100% CPU. - The VM console is frozen. - The VM is not pingable. Version-Release number of selected component (if applicable): RHEL56. [root@hb06b07 ~]# rpm -qa |grep kvm etherboot-zroms-kvm-5.4.4-13.el5 kvm-debuginfo-83-224.el5_6.1 kvm-qemu-img-83-224.el5 kvm-83-224.el5 kmod-kvm-83-224.el5 [root@hb06b07 ~]# rpm -qa|grep libvirt libvirt-python-0.8.2-15.el5 libvirt-0.8.2-15.el5 libvirt-devel-0.8.2-15.el5 libvirt-debuginfo-0.8.2-15.el5_6.4 libvirt-0.8.2-15.el5 libvirt-debuginfo-0.8.2-15.el5_6.4 libvirt-devel-0.8.2-15.el5 How reproducible: I have a program that integrates with libvirt. Basically the program will suspend and resume VMs. In response to some condition a VM will be saved. At a later time it will be resumed, possibly on a different host. In my current cluster I have one AMD machine and one Intel machine. After letting the test program run for a few hours, a VM will resume and it will be in this unusable state. Most of the time resume is successful. Here is the command-line of the problematic VM: root 6615 1 97 16:24 ? 00:21:29 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__364 -uuid c6de70f9-0449-4877-9005-d60c3739a136 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__364.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:29:15:f1,vlan=0 -net tap,fd=26,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us -vga cirrus -incoming exec:cat -balloon virtio And its config file. [root@hb06b07 ~]# cat /etc/libvirt/qemu/_vm_lsf_dyn__364.xml <domain type='kvm'> <name>_vm_lsf_dyn__364</name> <uuid>c6de70f9-0449-4877-9005-d60c3739a136</uuid> <memory>524288</memory> <currentMemory>524288</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='rhel5.4.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <controller type='ide' index='0'/> <interface type='bridge'> <mac address='00:16:3e:29:15:f1'/> <source bridge='br0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/> <video> <model type='cirrus' vram='9216' heads='1'/> </video> </devices> </domain> top: top - 16:47:40 up 11 days, 21:26, 2 users, load average: 1.34, 1.25, 1.24 Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 28.4%sy, 0.0%ni, 71.2%id, 0.1%wa, 0.1%hi, 0.1%si, 0.0%st Mem: 4045524k total, 3867892k used, 177632k free, 11288k buffers Swap: 2097144k total, 124k used, 2097020k free, 2288848k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6615 root 15 0 692m 539m 2424 S 100.1 13.6 22:59.54 qemu-kvm 6648 root 15 0 702m 539m 2448 S 7.3 13.7 0:59.03 qemu-kvm 6201 root 15 0 702m 204m 2448 S 6.3 5.2 1:32.45 qemu-kvm 10641 root 15 0 12764 1136 824 R 0.3 0.0 0:00.01 top 1 root 15 0 10372 696 584 S 0.0 0.0 0:01.76 init gdb: [root@hb06b07 ~]# gdb -p 6615 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Attaching to process 6615 Reading symbols from /usr/libexec/qemu-kvm... warning: the debug information found in "/usr/lib/debug//usr/libexec/qemu-kvm.debug" does not match "/usr/libexec/qemu-kvm" (CRC mismatch). warning: the debug information found in "/usr/lib/debug/usr/libexec/qemu-kvm.debug" does not match "/usr/libexec/qemu-kvm" (CRC mismatch). (no debugging symbols found)...done. Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /usr/lib64/libspice.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libspice.so.0 Reading symbols from /usr/lib64/liblog4cpp.so.4...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/liblog4cpp.so.4 Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /usr/lib64/libpng12.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libpng12.so.0 Reading symbols from /usr/lib64/libqcairo.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libqcairo.so.2 Reading symbols from /usr/lib64/libcelt051.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libcelt051.so.0 Reading symbols from /usr/lib64/libqavcodec.so.51...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libqavcodec.so.51 Reading symbols from /usr/lib64/libqavutil.so.49...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libqavutil.so.49 Reading symbols from /lib64/libasound.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libasound.so.2 Reading symbols from /lib64/libssl.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libssl.so.6 Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libcrypto.so.6 Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /usr/lib64/libXrandr.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXrandr.so.2 Reading symbols from /usr/lib64/libplds4.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libplds4.so Reading symbols from /usr/lib64/libplc4.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libplc4.so Reading symbols from /usr/lib64/libnspr4.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libnspr4.so Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 0x41d34940 (LWP 6708)] [New Thread 0x40ee7940 (LWP 6617)] [New Thread 0x427f5940 (LWP 6616)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /usr/lib64/libgnutls.so.13...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libgnutls.so.13 Reading symbols from /usr/lib64/libgcrypt.so.11...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libgcrypt.so.11 Reading symbols from /usr/lib64/libgpg-error.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libgpg-error.so.0 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libutil.so.1 Reading symbols from /usr/lib64/libSDL-1.2.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libSDL-1.2.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /usr/lib64/libGL.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libGL.so.1 Reading symbols from /usr/lib64/libGLU.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libGLU.so.1 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /usr/lib64/libqpixman-1.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libqpixman-1.so.0 Reading symbols from /usr/lib64/libfreetype.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libfreetype.so.6 Reading symbols from /usr/lib64/libfontconfig.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libfontconfig.so.1 Reading symbols from /usr/lib64/libXrender.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXrender.so.1 Reading symbols from /usr/lib64/libX11.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libX11.so.6 Reading symbols from /usr/lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libgssapi_krb5.so.2 Reading symbols from /usr/lib64/libkrb5.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libkrb5.so.3 Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libcom_err.so.2 Reading symbols from /usr/lib64/libk5crypto.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libk5crypto.so.3 Reading symbols from /usr/lib64/libXext.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXext.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libesd.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libesd.so.0 Reading symbols from /usr/lib64/libaudiofile.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libaudiofile.so.0 Reading symbols from /usr/lib64/libXxf86vm.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXxf86vm.so.1 Reading symbols from /usr/lib64/libdrm.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libdrm.so.2 Reading symbols from /lib64/libexpat.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libexpat.so.0 Reading symbols from /usr/lib64/libXau.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXau.so.6 Reading symbols from /usr/lib64/libXdmcp.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libXdmcp.so.6 Reading symbols from /usr/lib64/libkrb5support.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libkrb5support.so.0 Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libkeyutils.so.1 Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libselinux.so.1 Reading symbols from /lib64/libsepol.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libsepol.so.1 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff6e986000 0x000000361a4cd372 in select () from /lib64/libc.so.6 (gdb) thread apply all bt full Thread 4 (Thread 0x427f5940 (LWP 6616)): #0 0x000000361a431744 in do_sigwaitinfo () from /lib64/libc.so.6 No symbol table info available. #1 0x000000361a4317fd in sigwaitinfo () from /lib64/libc.so.6 No symbol table info available. #2 0x000000000041a991 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #3 0x000000361b00673d in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x000000361a4d40cd in clone () from /lib64/libc.so.6 No symbol table info available. Thread 3 (Thread 0x40ee7940 (LWP 6617)): #0 0x000000361a4cc9f7 in ioctl () from /lib64/libc.so.6 No symbol table info available. #1 0x000000000052bd8a in snd_pcm_hw_params_set_channels_near () No symbol table info available. #2 0x0000000000500759 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #3 0x00000000005009e3 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #4 0x000000361b00673d in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #5 0x000000361a4d40cd in clone () from /lib64/libc.so.6 No symbol table info available. Thread 2 (Thread 0x41d34940 (LWP 6708)): #0 0x000000361a431744 in do_sigwaitinfo () from /lib64/libc.so.6 No symbol table info available. #1 0x000000361a4317fd in sigwaitinfo () from /lib64/libc.so.6 No symbol table info available. #2 0x000000000041a991 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #3 0x000000361b00673d in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x000000361a4d40cd in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x2ba6893aa150 (LWP 6615)): #0 0x000000361a4cd372 in select () from /lib64/libc.so.6 No symbol table info available. #1 0x00000000004095e1 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #2 0x00000000005005fa in snd_pcm_hw_params_set_channels_near () No symbol table info available. #3 0x000000000040e795 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #4 0x000000361a41d994 in __libc_start_main () from /lib64/libc.so.6 No symbol table info available. #5 0x0000000000406da9 in snd_pcm_hw_params_set_channels_near () No symbol table info available. #6 0x00007fff6e8f2018 in ?? () No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. (gdb) strace shows that qemu-kvm keeps doing the following: read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 write(5, "\0", 1) = 1 read(19, 0x7fff6e8f0650, 128) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\0", 512) = 1 read(4, 0x7fff6e8f04d0, 512) = -1 EAGAIN (Resource temporarily unavailable) [root@hb06b07 ~]# ls -l /proc/6615/fd total 0 lr-x------ 1 root root 64 Jun 20 16:50 0 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/suspend/.nfs0000000000f4ccb800000455 l-wx------ 1 root root 64 Jun 20 16:50 1 -> /var/log/libvirt/qemu/_vm_lsf_dyn__364.log l-wx------ 1 root root 64 Jun 20 16:50 10 -> pipe:[4636407] lrwx------ 1 root root 64 Jun 20 16:50 11 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img lr-x------ 1 root root 64 Jun 20 16:50 12 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364.iso lrwx------ 1 root root 64 Jun 20 16:50 13 -> socket:[4636409] lrwx------ 1 root root 64 Jun 20 16:50 14 -> socket:[4636410] lrwx------ 1 root root 64 Jun 20 16:50 15 -> /dev/ptmx lrwx------ 1 root root 64 Jun 20 16:50 16 -> anon_inode:kvm-vcpu lrwx------ 1 root root 64 Jun 20 16:50 17 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jun 20 16:50 18 -> anon_inode:[eventfd] lr-x------ 1 root root 64 Jun 20 16:50 19 -> pipe:[4636554] l-wx------ 1 root root 64 Jun 20 16:50 2 -> /var/log/libvirt/qemu/_vm_lsf_dyn__364.log l-wx------ 1 root root 64 Jun 20 16:50 20 -> pipe:[4636554] lrwx------ 1 root root 64 Jun 20 16:50 21 -> socket:[4636555] lrwx------ 1 root root 64 Jun 20 16:50 26 -> /dev/net/tun lrwx------ 1 root root 64 Jun 20 16:50 3 -> /dev/kvm lr-x------ 1 root root 64 Jun 20 16:50 4 -> pipe:[4636405] l-wx------ 1 root root 64 Jun 20 16:50 5 -> pipe:[4636405] lrwx------ 1 root root 64 Jun 20 16:50 6 -> anon_inode:kvm-vm lrwx------ 1 root root 64 Jun 20 16:50 7 -> /dev/ksm lrwx------ 1 root root 64 Jun 20 16:50 8 -> anon_inode:ksm-sma lr-x------ 1 root root 64 Jun 20 16:50 9 -> pipe:[4636407]
Guest s3/s4 with virtio devices isn't yet supported, and is unlikely to be supported in RHEL5 releases. From your command line, looks like you only have the balloon virtio device. If you're not using it, can you disable it and then check if suspend/resume works as expected?
Amit. What is Guest s3/s4? Also, I used virsh edit to remove <memballoon model='virtio'/> But when I power on the VM it is added back again automatically. I'll find out why.
http://fossplanet.com/f13/%5Blibvirt%5D-%5Bpatch%5D-docs-document-how-disable-memballoon-63997/ Google knows everything.
Just to confirm that the balloon param was removed. [root@blue07 qemu]# ps -ef | grep kvm root 25297 1 27 00:12 ? 00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__385 -uuid 5c7bb027-acb9-4e41-9ea9-48576466a7f7 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__385.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__385_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__385.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:55:59:82,vlan=0 -net tap,fd=20,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus root 25334 1 27 00:12 ? 00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__392 -uuid 828606b9-fcc4-4159-873e-bcc132e1869e -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__392.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__392_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__392.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:ff:dc:ce,vlan=0 -net tap,fd=20,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us -vga cirrus root 25374 1 29 00:12 ? 00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__394 -uuid cd9c1617-2d8a-4efe-afc3-31d3e56696f2 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__394.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__394_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__394.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:d8:b9:86,vlan=0 -net tap,fd=22,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:2 -k en-us -vga cirrus root 25405 1 28 00:12 ? 00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__395 -uuid 5121750a-96d7-4bf3-80f3-da85f428152d -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__395.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__395_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__395.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:68:88:47,vlan=0 -net tap,fd=22,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us -vga cirrus
(In reply to comment #2) > Amit. What is Guest s3/s4? It means suspend-to-memory or suspend-to-disk from within a guest. After removing the virtio-balloon device, does suspend/resume work fine?
It seems like libvirt that comes with RHEL56 _always_ enables the ballon device. The output in comment #4 is with libvirt-0.9.1. In that test environment the problem occured. I rolled back the libvirt RPMs and installed the standard libvirt that comes with RHEL56 but I cannot get libvirt to disable the ballon device (unless I make a code change and rebuild the rpms). I think the test with libvirt 0.9.2 is still valid. The same kvm RPMs w/o the ballon device causes the same behaviour. As before, I had to let the stress test run for an hour before the bug happened.
Just blacklisting the virtio-balloon module in the guest will work as well.
I disabled the virtio_ballon module by renaming the file and then rebooting. In the VM: [root@localhost ~]# uptime 09:08:44 up 1 min, 2 users, load average: 0.48, 0.22, 0.08 [root@localhost ~]# ls -l /lib/modules/2.6.18-238.el5/kernel/drivers/virtio/ total 192 -rwxr--r-- 1 root root 44608 Dec 19 2010 virtio_balloon.ko.XXX -rwxr--r-- 1 root root 43024 Dec 19 2010 virtio.ko -rwxr--r-- 1 root root 45944 Dec 19 2010 virtio_pci.ko -rwxr--r-- 1 root root 40808 Dec 19 2010 virtio_ring.ko [root@localhost ~]# lsmod | grep virtio virtio_net 48193 0 virtio_blk 41673 3 virtio_pci 41545 0 virtio_ring 37953 1 virtio_pci virtio 39365 3 virtio_net,virtio_blk,virtio_pci On the hypervisor: [root@delamd06 ~]# ps -ef | grep kvm root 24905 1 25 08:57 ? 00:03:18 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name tmpl -uuid 6dca4b19-e48b-04ae-d3d8-f206e57a75a8 -monitor unix:/var/lib/libvirt/qemu/tmpl.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/template/images/RHEL56_32G_1.img,if=virtio,boot=on,format=qcow2,cache=none -net nic,macaddr=54:52:00:74:78:9f,vlan=0,model=virtio -net tap,fd=18,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio [root@delamd06 ~]# rpm -qa | grep libvirt libvirt-0.8.2-15.el5 libvirt-python-0.8.2-15.el5 libvirt-0.8.2-15.el5 I made the change in the VM template. Then I setup the test case and let it run. After about 6 hours I didn't see the bug again. I'll continue to monitor it.
You need to disable all virtio devices -- net, blk. No virtio devices can handle hibernate yet.
Amit, I want to make sure we're on the same page here. I'm suspending the VM by running the command "virsh save <domain id> <state file>". Not by the save to memory/save to disk features of Linux/Windows that also work on a PM. Does virtio support this?
(In reply to comment #10) > Amit, I want to make sure we're on the same page here. I'm suspending the VM > by running the command "virsh save <domain id> <state file>". Not by the save > to memory/save to disk features of Linux/Windows that also work on a PM. Aha, I haven't seen that mentioned anywhere yet; or I missed it. That shouldn't cause a problem with virtio indeed. Do you have the logs for the qemu process corresponding to the guest that gets stuck in /var/log/libvirt/qemu/ ? Please upload them here.
Can you re-try with -M rhel5.6 ? rhel5.4 had issues with not saving some kvmclock fields.
Dor, I started testing the case you suggest now. I restored the virtio_balloon module in the template and change the config from: <os> <type arch='x86_64' machine='rhel5.4.0'>hvm</type> <boot dev='hd'/> </os> to <os> <type arch='x86_64' machine='rhel5.6.0'>hvm</type> <boot dev='hd'/> </os> I'll post the results later. [root@delamd05 qemu]# ps -ef | grep kvm root 14012 1 36 12:31 ? 00:01:16 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__74 -uuid 5c3872c2-5538-485a-a603-1bda5ca59598 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__74.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__74_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__74.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:f8:66:e6,vlan=0,model=virtio -net tap,fd=54,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us -vga cirrus -balloon virtio root 14599 1 11 12:33 ? 00:00:08 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__70 -uuid 50e7a6d8-baf5-4c54-99ea-13ac1ee08066 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__70.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__70_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__70.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:c8:6c:d3,vlan=0,model=virtio -net tap,fd=55,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -incoming exec:cat -balloon virtio root 14799 1 12 12:33 ? 00:00:09 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__68 -uuid 3f9a2f77-7601-4424-8302-43070f8943f2 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__68.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__68_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__68.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:50:a6:6a,vlan=0,model=virtio -net tap,fd=63,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:2 -k en-us -vga cirrus -incoming exec:cat -balloon virtio root 14989 4726 0 12:35 pts/0 00:00:00 grep kvm
Sometimes virt-manager freezes up. (gdb) info threads * 1 Thread 0x2b59462ef170 (LWP 4461) 0x0000003a802cb2e6 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x0000003a802cb2e6 in poll () from /lib64/libc.so.6 #1 0x000000333feb05d2 in remoteIOEventLoop (conn=0x1ffcdb20, priv=0x1ffd1020, in_open=0, thiscall=0x20556db0) at remote/remote_driver.c:9657 #2 0x000000333feb10bd in remoteIO (conn=0x1ffcdb20, priv=0x1ffd1020, flags=0, thiscall=0x20556db0) at remote/remote_driver.c:9901 #3 0x000000333feb178b in call (conn=0x1ffcdb20, priv=0x1ffd1020, flags=0, proc_nr=16, args_filter=0x333feb3d0e <xdr_remote_domain_get_info_args>, args=0x7fff1d64eb50 "@\206\001 ", ret_filter=0x333feb3d44 <xdr_remote_domain_get_info_ret>, ret=0x7fff1d64eb20 "") at remote/remote_driver.c:9990 #4 0x000000333fea06e0 in remoteDomainGetInfo (domain=0x2001c580, info=0x7fff1d64ec00) at remote/remote_driver.c:2297 #5 0x000000333fe7ed12 in virDomainGetInfo (domain=0x2001c580, info=0x7fff1d64ec00) at libvirt.c:3050 #6 0x00002b594990daac in libvirt_virDomainGetInfo (self=0x0, args=0x1fe4fc50) at libvirt-override.c:1025 #7 0x0000003a8129639a in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #8 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #9 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #10 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #11 0x0000003a81295a1f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #12 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #13 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #14 0x0000003a8124c6d7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #15 0x0000003a81236430 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #16 0x0000003a8123c52f in ?? () from /usr/lib64/libpython2.4.so.1.0 #17 0x0000003a81236430 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #18 0x0000003a81290f1d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b594b2025cf in ?? () from /usr/lib64/python2.4/site-packages/gtk-2.0/gobject/_gobject.so #20 0x0000003a8222d2bb in ?? () from /lib64/libglib-2.0.so.0 #21 0x0000003a8222cdb4 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #22 0x0000003a8222fc0d in ?? () from /lib64/libglib-2.0.so.0 #23 0x0000003a8222ff1a in g_main_loop_run () from /lib64/libglib-2.0.so.0 #24 0x0000003a8df2aa63 in gtk_main () from /usr/lib64/libgtk-x11-2.0.so.0 #25 0x00002b594b81d684 in ?? () from /usr/lib64/python2.4/site-packages/gtk-2.0/gtk/_gtk.so #26 0x0000003a81296167 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #27 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #28 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #29 0x0000003a81297312 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #30 0x0000003a812b39f9 in ?? () from /usr/lib64/libpython2.4.so.1.0 #31 0x0000003a812b4ea8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #32 0x0000003a812bb33d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #33 0x0000003a8021d994 in __libc_start_main () from /lib64/libc.so.6 #34 0x0000000000400629 in _start () Looks like it is waiting for libvirtd. Thread 26 (Thread 0x50b15940 (LWP 18696)): #0 0x0000003a80e0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003a80e101b1 in _L_cond_lock_989 () from /lib64/libpthread.so.0 #2 0x0000003a80e1007f in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0 #3 0x0000003a80e0af84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #4 0x000000333fe399c2 in virCondWait (c=0x2aaab407e368, m=0x2aaab407e340) at util/threads-pthread.c:100 #5 0x000000000046ef16 in qemuMonitorSend (mon=0x2aaab407e340, msg=0x50b14b70) at qemu/qemu_monitor.c:728 #6 0x0000000000472e54 in qemuMonitorCommandWithHandler (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", passwordHandler=0, passwordOpaque=0x0, scm_fd=-1, reply=0x50b14c70) at qemu/qemu_monitor_text.c:340 #7 0x0000000000472fcb in qemuMonitorCommandWithFd (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", scm_fd=-1, reply=0x50b14c70) at qemu/qemu_monitor_text.c:374 #8 0x0000000000472ff7 in qemuMonitorCommand (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", reply=0x50b14c70) at qemu/qemu_monitor_text.c:381 #9 0x0000000000473996 in qemuMonitorTextGetBalloonInfo (mon=0x2aaab407e340, currmem=0x50b14d38) at qemu/qemu_monitor_text.c:680 #10 0x000000000046fb4e in qemuMonitorGetBalloonInfo (mon=0x2aaab407e340, currmem=0x50b14d38) at qemu/qemu_monitor.c:1014 #11 0x0000000000440755 in qemudDomainGetInfo (dom=0x1950800, info=0x50b14e20) at qemu/qemu_driver.c:4886 #12 0x000000333fe7ed12 in virDomainGetInfo (domain=0x1950800, info=0x50b14e20) at libvirt.c:3050 #13 0x00000000004216d0 in remoteDispatchDomainGetInfo (server=0x18ea910, client=0x2aaaac001140, conn=0x19aaa80, hdr=0x2aaaaca94c20, rerr=0x50b14f50, args=0x50b14f00, ret=0x50b14ea0) at remote.c:1485 #14 0x000000000042b3c3 in remoteDispatchClientCall (server=0x18ea910, client=0x2aaaac001140, msg=0x2aaaaca54c10) at dispatch.c:508 #15 0x000000000042afe8 in remoteDispatchClientRequest (server=0x18ea910, client=0x2aaaac001140, msg=0x2aaaaca54c10) at dispatch.c:390 #16 0x000000000041a4a8 in qemudWorker (data=0x18f0a18) at libvirtd.c:1574 #17 0x0000003a80e0673d in start_thread () from /lib64/libpthread.so.0 #18 0x0000003a802d40cd in clone () from /lib64/libc.so.6 Perhaps libvirtd is blocking on the guest OS? Querying the mem balloon? Just a guess.
I let the test run over the weekend. It stopped working Saturday afternoon. The agent process in our software that links to libvirt was blocked in a call to virDomainInfo(), which didn't return at all. I let it run until Monday morning at which time I restarted libvirtd to get things going again. I check the stack trace before restarting and I think 4 threads were doing "info balloon", just like the stack trace in the previous append. There are 2 hypervisors in my environment and both stopped processing LSF jobs (and VM requests) because our agent was blocked in virDomainInfo(). The good news is that I didn't observe the VM that was frozen after a resume. I will disable the balloon module to see if that will prevent virDomainInfo() from hanging and reset the test.
There is another bug that blocks the retest for this one. The libvirt client locks up in virDomainGetInfo(). I'll log a separate issue to track that one.
From comment #10, can confirm this is "migrate exec:dd" action, to repeat migration and load vm will leads to vm stack on 100% cpu. According to above comments, this work need long time test. I will write script to test it and update the result when I complete.
Can you look if you can reproduce with only amd/intel hosts? We don't support migration (save/resume is the same code path) between architectures on RHEL5/6?
Could you check the issue only happened when migrate guest from intel host to AMD host or from AMD host to Intel host? If so ,this is a senario we do not support. Best Regards, Mike
(In reply to comment #18) > Can you look if you can reproduce with only amd/intel hosts? We don't support > migration (save/resume is the same code path) between architectures on RHEL5/6? Yes, I am doing test with only amd/intel hosts.
Tested migration with guest RHEL5.6-32 & RHEL5.6-64 200 times, didn't hit this bug. steps: 1. migrate -d "exec:dd of=/tmp/rhel5.6img.test bs=4096 seek=1" 2. boot vm with -incoming "exec:dd if=/tmp/rhel5.6img.test bs=4096 skip=1" The job link: Intel-host: https://virtlab.englab.nay.redhat.com/job/41536/details/ AMD-host: https://virtlab.englab.nay.redhat.com/job/41537/details/ The host info: kernel-2.6.18-294.el5 kvm-83-243.el5 cmd: /home/autotest-devel/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20111116-130535-rKwp',server,nowait -serial unix:'/tmp/serial-20111116-130535-rKwp',server,nowait -drive file='/home/autotest-devel/client/tests/kvm/images/RHEL-Server-5.6-32.raw',index=0,if=ide,media=disk,cache=none,format=raw -net nic,vlan=0,model=rtl8139,macaddr='9a:86:29:6f:18:a3' -net tap,vlan=0,fd=36 -m 1024 -smp 2,cores=1,threads=1,sockets=2 -cpu qemu64,+sse2 -vnc :1 -rtc-td-hack -boot c -no-kvm-pit-reinjection -M rhel5.6.0 -usbdevice tablet -S -incoming "exec:dd if=/tmp/rhel5.6img.test bs=4096 skip=1"
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Hi Michael Closson, Thank you for taking the time to enter a bug report with us. We do appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for getting support, and as such we are not able to make any guarantees as to the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain that it gets the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please see: https://www.redhat.com/support/process/production/#howto For now, this bug is not reproducible here, so I am closing it for RHEL5. Thanks, Ronen.