This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 714811 - Resumed VM consumes 100% cpu. Console frozen, not pingable.
Resumed VM consumes 100% cpu. Console frozen, not pingable.
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm (Show other bugs)
5.6
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Amit Shah
Virtualization Bugs
:
Depends On:
Blocks: Rhel5KvmTier3 807971
  Show dependency treegraph
 
Reported: 2011-06-20 16:51 EDT by Michael Closson
Modified: 2013-01-09 19:00 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-09 06:33:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Michael Closson 2011-06-20 16:51:27 EDT
Description of problem:

Sometimes after resuming a VM, the VM is unusable.
- The qemu-kvm process consumes 100% CPU.
- The VM console is frozen.
- The VM is not pingable.

Version-Release number of selected component (if applicable):

RHEL56.

[root@hb06b07 ~]# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-224.el5_6.1
kvm-qemu-img-83-224.el5
kvm-83-224.el5
kmod-kvm-83-224.el5


[root@hb06b07 ~]# rpm -qa|grep libvirt
libvirt-python-0.8.2-15.el5
libvirt-0.8.2-15.el5
libvirt-devel-0.8.2-15.el5
libvirt-debuginfo-0.8.2-15.el5_6.4
libvirt-0.8.2-15.el5
libvirt-debuginfo-0.8.2-15.el5_6.4
libvirt-devel-0.8.2-15.el5




How reproducible:

I have a program that integrates with libvirt.  Basically the program will suspend and resume VMs.  In response to some condition a VM will be saved.  At a later time it will be resumed, possibly on a different host.  In my current cluster I have one AMD machine and one Intel machine.

After letting the test program run for a few hours, a VM will resume and it will be in this unusable state.
  
Most of the time resume is successful.

Here is the command-line of the problematic VM:


root      6615     1 97 16:24 ?        00:21:29 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__364 -uuid c6de70f9-0449-4877-9005-d60c3739a136 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__364.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:29:15:f1,vlan=0 -net tap,fd=26,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us -vga cirrus -incoming exec:cat -balloon virtio


And its config file.


[root@hb06b07 ~]# cat /etc/libvirt/qemu/_vm_lsf_dyn__364.xml 
<domain type='kvm'>
  <name>_vm_lsf_dyn__364</name>
  <uuid>c6de70f9-0449-4877-9005-d60c3739a136</uuid>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel5.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'/>
    <interface type='bridge'>
      <mac address='00:16:3e:29:15:f1'/>
      <source bridge='br0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
    </video>
  </devices>
</domain>


top:

top - 16:47:40 up 11 days, 21:26,  2 users,  load average: 1.34, 1.25, 1.24
Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us, 28.4%sy,  0.0%ni, 71.2%id,  0.1%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:   4045524k total,  3867892k used,   177632k free,    11288k buffers
Swap:  2097144k total,      124k used,  2097020k free,  2288848k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                            
 6615 root      15   0  692m 539m 2424 S 100.1 13.6  22:59.54 qemu-kvm                                                                                                          
 6648 root      15   0  702m 539m 2448 S  7.3 13.7   0:59.03 qemu-kvm                                                                                                           
 6201 root      15   0  702m 204m 2448 S  6.3  5.2   1:32.45 qemu-kvm                                                                                                           
10641 root      15   0 12764 1136  824 R  0.3  0.0   0:00.01 top                                                                                                                
    1 root      15   0 10372  696  584 S  0.0  0.0   0:01.76 init                                                                                                               


gdb:

[root@hb06b07 ~]# gdb -p 6615
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 6615
Reading symbols from /usr/libexec/qemu-kvm...
warning: the debug information found in "/usr/lib/debug//usr/libexec/qemu-kvm.debug" does not match "/usr/libexec/qemu-kvm" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/libexec/qemu-kvm.debug" does not match "/usr/libexec/qemu-kvm" (CRC mismatch).

(no debugging symbols found)...done.
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /usr/lib64/libspice.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libspice.so.0
Reading symbols from /usr/lib64/liblog4cpp.so.4...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liblog4cpp.so.4
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /usr/lib64/libpng12.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libpng12.so.0
Reading symbols from /usr/lib64/libqcairo.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libqcairo.so.2
Reading symbols from /usr/lib64/libcelt051.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcelt051.so.0
Reading symbols from /usr/lib64/libqavcodec.so.51...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libqavcodec.so.51
Reading symbols from /usr/lib64/libqavutil.so.49...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libqavutil.so.49
Reading symbols from /lib64/libasound.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libasound.so.2
Reading symbols from /lib64/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libssl.so.6
Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypto.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /usr/lib64/libXrandr.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXrandr.so.2
Reading symbols from /usr/lib64/libplds4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libplds4.so
Reading symbols from /usr/lib64/libplc4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libplc4.so
Reading symbols from /usr/lib64/libnspr4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnspr4.so
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x41d34940 (LWP 6708)]
[New Thread 0x40ee7940 (LWP 6617)]
[New Thread 0x427f5940 (LWP 6616)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libgnutls.so.13...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libgnutls.so.13
Reading symbols from /usr/lib64/libgcrypt.so.11...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libgcrypt.so.11
Reading symbols from /usr/lib64/libgpg-error.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libgpg-error.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /usr/lib64/libSDL-1.2.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libSDL-1.2.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libGL.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libGL.so.1
Reading symbols from /usr/lib64/libGLU.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libGLU.so.1
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /usr/lib64/libqpixman-1.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libqpixman-1.so.0
Reading symbols from /usr/lib64/libfreetype.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libfreetype.so.6
Reading symbols from /usr/lib64/libfontconfig.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libfontconfig.so.1
Reading symbols from /usr/lib64/libXrender.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXrender.so.1
Reading symbols from /usr/lib64/libX11.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libX11.so.6
Reading symbols from /usr/lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /usr/lib64/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /usr/lib64/libXext.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXext.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libesd.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libesd.so.0
Reading symbols from /usr/lib64/libaudiofile.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaudiofile.so.0
Reading symbols from /usr/lib64/libXxf86vm.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXxf86vm.so.1
Reading symbols from /usr/lib64/libdrm.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libdrm.so.2
Reading symbols from /lib64/libexpat.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libexpat.so.0
Reading symbols from /usr/lib64/libXau.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXau.so.6
Reading symbols from /usr/lib64/libXdmcp.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libXdmcp.so.6
Reading symbols from /usr/lib64/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libsepol.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libsepol.so.1

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff6e986000
0x000000361a4cd372 in select () from /lib64/libc.so.6
(gdb) thread apply all bt full

Thread 4 (Thread 0x427f5940 (LWP 6616)):
#0  0x000000361a431744 in do_sigwaitinfo () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000361a4317fd in sigwaitinfo () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000000041a991 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#3  0x000000361b00673d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000361a4d40cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 3 (Thread 0x40ee7940 (LWP 6617)):
#0  0x000000361a4cc9f7 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000052bd8a in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#2  0x0000000000500759 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#3  0x00000000005009e3 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#4  0x000000361b00673d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x000000361a4d40cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (Thread 0x41d34940 (LWP 6708)):
#0  0x000000361a431744 in do_sigwaitinfo () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000361a4317fd in sigwaitinfo () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000000041a991 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#3  0x000000361b00673d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000361a4d40cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x2ba6893aa150 (LWP 6615)):
#0  0x000000361a4cd372 in select () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000000004095e1 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#2  0x00000000005005fa in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#3  0x000000000040e795 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#4  0x000000361a41d994 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#5  0x0000000000406da9 in snd_pcm_hw_params_set_channels_near ()
No symbol table info available.
#6  0x00007fff6e8f2018 in ?? ()
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) 

strace shows that qemu-kvm keeps doing the following:



read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)
read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
write(5, "\0", 1)                       = 1
read(19, 0x7fff6e8f0650, 128)           = -1 EAGAIN (Resource temporarily unavailable)
read(4, "\0", 512)                      = 1
read(4, 0x7fff6e8f04d0, 512)            = -1 EAGAIN (Resource temporarily unavailable)



[root@hb06b07 ~]# ls -l /proc/6615/fd
total 0
lr-x------ 1 root root 64 Jun 20 16:50 0 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/suspend/.nfs0000000000f4ccb800000455
l-wx------ 1 root root 64 Jun 20 16:50 1 -> /var/log/libvirt/qemu/_vm_lsf_dyn__364.log
l-wx------ 1 root root 64 Jun 20 16:50 10 -> pipe:[4636407]
lrwx------ 1 root root 64 Jun 20 16:50 11 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364_1.img
lr-x------ 1 root root 64 Jun 20 16:50 12 -> /VMOxen/storage/share/SR/vmonfs1/export/data/vmodev/mclosson2/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__364.iso
lrwx------ 1 root root 64 Jun 20 16:50 13 -> socket:[4636409]
lrwx------ 1 root root 64 Jun 20 16:50 14 -> socket:[4636410]
lrwx------ 1 root root 64 Jun 20 16:50 15 -> /dev/ptmx
lrwx------ 1 root root 64 Jun 20 16:50 16 -> anon_inode:kvm-vcpu
lrwx------ 1 root root 64 Jun 20 16:50 17 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 Jun 20 16:50 18 -> anon_inode:[eventfd]
lr-x------ 1 root root 64 Jun 20 16:50 19 -> pipe:[4636554]
l-wx------ 1 root root 64 Jun 20 16:50 2 -> /var/log/libvirt/qemu/_vm_lsf_dyn__364.log
l-wx------ 1 root root 64 Jun 20 16:50 20 -> pipe:[4636554]
lrwx------ 1 root root 64 Jun 20 16:50 21 -> socket:[4636555]
lrwx------ 1 root root 64 Jun 20 16:50 26 -> /dev/net/tun
lrwx------ 1 root root 64 Jun 20 16:50 3 -> /dev/kvm
lr-x------ 1 root root 64 Jun 20 16:50 4 -> pipe:[4636405]
l-wx------ 1 root root 64 Jun 20 16:50 5 -> pipe:[4636405]
lrwx------ 1 root root 64 Jun 20 16:50 6 -> anon_inode:kvm-vm
lrwx------ 1 root root 64 Jun 20 16:50 7 -> /dev/ksm
lrwx------ 1 root root 64 Jun 20 16:50 8 -> anon_inode:ksm-sma
lr-x------ 1 root root 64 Jun 20 16:50 9 -> pipe:[4636407]
Comment 1 Amit Shah 2011-06-21 21:57:42 EDT
Guest s3/s4 with virtio devices isn't yet supported, and is unlikely to be supported in RHEL5 releases.  From your command line, looks like you only have the balloon virtio device.  If you're not using it, can you disable it and then check if suspend/resume works as expected?
Comment 2 Michael Closson 2011-06-21 23:43:28 EDT
Amit.  What is Guest s3/s4?

Also, I used virsh edit to remove

    <memballoon model='virtio'/>

But when I power on the VM it is added back again automatically.  I'll find out why.
Comment 4 Michael Closson 2011-06-22 00:14:27 EDT
Just to confirm that the balloon param was removed.

[root@blue07 qemu]# ps -ef | grep kvm
root     25297     1 27 00:12 ?        00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__385 -uuid 5c7bb027-acb9-4e41-9ea9-48576466a7f7 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__385.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__385_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__385.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:55:59:82,vlan=0 -net tap,fd=20,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus
root     25334     1 27 00:12 ?        00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__392 -uuid 828606b9-fcc4-4159-873e-bcc132e1869e -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__392.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__392_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__392.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:ff:dc:ce,vlan=0 -net tap,fd=20,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us -vga cirrus
root     25374     1 29 00:12 ?        00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__394 -uuid cd9c1617-2d8a-4efe-afc3-31d3e56696f2 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__394.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__394_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__394.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:d8:b9:86,vlan=0 -net tap,fd=22,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:2 -k en-us -vga cirrus
root     25405     1 28 00:12 ?        00:00:07 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 512 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__395 -uuid 5121750a-96d7-4bf3-80f3-da85f428152d -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__395.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__395_1.img,if=ide,bus=0,unit=0,format=raw,cache=none -drive file=/VMOxen/storage/share/SR/1c0b78ea-0580-4b51-8ec9-9b95d0543033/VM/images/_vm_lsf_dyn__395.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:68:88:47,vlan=0 -net tap,fd=22,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us -vga cirrus
Comment 5 Amit Shah 2011-07-07 05:16:30 EDT
(In reply to comment #2)
> Amit.  What is Guest s3/s4?

It means suspend-to-memory or suspend-to-disk from within a guest.

After removing the virtio-balloon device, does suspend/resume work fine?
Comment 6 Michael Closson 2011-07-07 14:08:49 EDT
It seems like libvirt that comes with RHEL56 _always_ enables the ballon device.  The output in comment #4 is with libvirt-0.9.1.   In that test environment the problem occured.

I rolled back the libvirt RPMs and installed the standard libvirt that comes with RHEL56 but I cannot get libvirt to disable the ballon device (unless I make a code change and rebuild the rpms).

I think the test with libvirt 0.9.2 is still valid. The same kvm RPMs w/o the ballon device causes the same behaviour.  As before, I had to let the stress test run for an hour before the bug happened.
Comment 7 Amit Shah 2011-07-28 07:16:21 EDT
Just blacklisting the virtio-balloon module in the guest will work as well.
Comment 8 Michael Closson 2011-08-01 18:00:33 EDT
I disabled the virtio_ballon module by renaming the file and then rebooting.

In the VM:

[root@localhost ~]# uptime
 09:08:44 up 1 min,  2 users,  load average: 0.48, 0.22, 0.08

[root@localhost ~]# ls -l /lib/modules/2.6.18-238.el5/kernel/drivers/virtio/
total 192
-rwxr--r-- 1 root root 44608 Dec 19  2010 virtio_balloon.ko.XXX
-rwxr--r-- 1 root root 43024 Dec 19  2010 virtio.ko
-rwxr--r-- 1 root root 45944 Dec 19  2010 virtio_pci.ko
-rwxr--r-- 1 root root 40808 Dec 19  2010 virtio_ring.ko

[root@localhost ~]# lsmod | grep virtio
virtio_net             48193  0 
virtio_blk             41673  3 
virtio_pci             41545  0 
virtio_ring            37953  1 virtio_pci
virtio                 39365  3 virtio_net,virtio_blk,virtio_pci

On the hypervisor:

[root@delamd06 ~]# ps -ef | grep kvm
root     24905     1 25 08:57 ?        00:03:18 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name tmpl -uuid 6dca4b19-e48b-04ae-d3d8-f206e57a75a8 -monitor unix:/var/lib/libvirt/qemu/tmpl.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/template/images/RHEL56_32G_1.img,if=virtio,boot=on,format=qcow2,cache=none -net nic,macaddr=54:52:00:74:78:9f,vlan=0,model=virtio -net tap,fd=18,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio

[root@delamd06 ~]# rpm -qa | grep libvirt
libvirt-0.8.2-15.el5
libvirt-python-0.8.2-15.el5
libvirt-0.8.2-15.el5



I made the change in the VM template.  Then I setup the test case and let it run.


After about 6 hours I didn't see the bug again.  I'll continue to monitor it.
Comment 9 Amit Shah 2011-08-02 01:33:45 EDT
You need to disable all virtio devices -- net, blk.  No virtio devices can handle hibernate yet.
Comment 10 Michael Closson 2011-08-02 10:16:15 EDT
Amit, I want to make sure we're on the same page here.  I'm suspending the VM by running the command "virsh save <domain id> <state file>".  Not by the save to memory/save to disk features of Linux/Windows that also work on a PM.

Does virtio support this?
Comment 11 Amit Shah 2011-08-02 10:52:39 EDT
(In reply to comment #10)
> Amit, I want to make sure we're on the same page here.  I'm suspending the VM
> by running the command "virsh save <domain id> <state file>".  Not by the save
> to memory/save to disk features of Linux/Windows that also work on a PM.

Aha, I haven't seen that mentioned anywhere yet; or I missed it.  That shouldn't cause a problem with virtio indeed.

Do you have the logs for the qemu process corresponding to the guest that gets stuck in /var/log/libvirt/qemu/ ?  Please upload them here.
Comment 12 Dor Laor 2011-08-16 20:41:43 EDT
Can you re-try with -M rhel5.6 ?
rhel5.4 had issues with not saving some kvmclock fields.
Comment 13 Michael Closson 2011-08-19 12:35:47 EDT
Dor,  I started testing the case you suggest now.  I restored the virtio_balloon module in the template and change the config from:

  <os>
    <type arch='x86_64' machine='rhel5.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>

to

  <os>
    <type arch='x86_64' machine='rhel5.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>

I'll post the results later.


[root@delamd05 qemu]# ps -ef | grep kvm
root     14012     1 36 12:31 ?        00:01:16 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__74 -uuid 5c3872c2-5538-485a-a603-1bda5ca59598 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__74.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__74_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__74.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:f8:66:e6,vlan=0,model=virtio -net tap,fd=54,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us -vga cirrus -balloon virtio
root     14599     1 11 12:33 ?        00:00:08 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__70 -uuid 50e7a6d8-baf5-4c54-99ea-13ac1ee08066 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__70.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__70_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__70.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:c8:6c:d3,vlan=0,model=virtio -net tap,fd=55,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -incoming exec:cat -balloon virtio
root     14799     1 12 12:33 ?        00:00:09 /usr/libexec/qemu-kvm -S -M rhel5.6.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name _vm_lsf_dyn__68 -uuid 3f9a2f77-7601-4424-8302-43070f8943f2 -monitor unix:/var/lib/libvirt/qemu/_vm_lsf_dyn__68.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__68_1.img,if=virtio,boot=on,format=qcow2,cache=none -drive file=/VMOxen/storage/share/SR/755d80a2-d9fb-43cc-ba3e-a83409796669/VM/images/_vm_lsf_dyn__68.iso,if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=00:16:3e:50:a6:6a,vlan=0,model=virtio -net tap,fd=63,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:2 -k en-us -vga cirrus -incoming exec:cat -balloon virtio
root     14989  4726  0 12:35 pts/0    00:00:00 grep kvm
Comment 14 Michael Closson 2011-08-19 14:13:10 EDT
Sometimes virt-manager freezes up.


(gdb) info threads
* 1 Thread 0x2b59462ef170 (LWP 4461)  0x0000003a802cb2e6 in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003a802cb2e6 in poll () from /lib64/libc.so.6
#1  0x000000333feb05d2 in remoteIOEventLoop (conn=0x1ffcdb20, priv=0x1ffd1020, in_open=0, thiscall=0x20556db0) at remote/remote_driver.c:9657
#2  0x000000333feb10bd in remoteIO (conn=0x1ffcdb20, priv=0x1ffd1020, flags=0, thiscall=0x20556db0) at remote/remote_driver.c:9901
#3  0x000000333feb178b in call (conn=0x1ffcdb20, priv=0x1ffd1020, flags=0, proc_nr=16, args_filter=0x333feb3d0e <xdr_remote_domain_get_info_args>, 
    args=0x7fff1d64eb50 "@\206\001 ", ret_filter=0x333feb3d44 <xdr_remote_domain_get_info_ret>, ret=0x7fff1d64eb20 "") at remote/remote_driver.c:9990
#4  0x000000333fea06e0 in remoteDomainGetInfo (domain=0x2001c580, info=0x7fff1d64ec00) at remote/remote_driver.c:2297
#5  0x000000333fe7ed12 in virDomainGetInfo (domain=0x2001c580, info=0x7fff1d64ec00) at libvirt.c:3050
#6  0x00002b594990daac in libvirt_virDomainGetInfo (self=0x0, args=0x1fe4fc50) at libvirt-override.c:1025
#7  0x0000003a8129639a in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#8  0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#9  0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#10 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0
#11 0x0000003a81295a1f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#12 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#13 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0
#14 0x0000003a8124c6d7 in ?? () from /usr/lib64/libpython2.4.so.1.0
#15 0x0000003a81236430 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0
#16 0x0000003a8123c52f in ?? () from /usr/lib64/libpython2.4.so.1.0
#17 0x0000003a81236430 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0
#18 0x0000003a81290f1d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0
#19 0x00002b594b2025cf in ?? () from /usr/lib64/python2.4/site-packages/gtk-2.0/gobject/_gobject.so
#20 0x0000003a8222d2bb in ?? () from /lib64/libglib-2.0.so.0
#21 0x0000003a8222cdb4 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#22 0x0000003a8222fc0d in ?? () from /lib64/libglib-2.0.so.0
#23 0x0000003a8222ff1a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#24 0x0000003a8df2aa63 in gtk_main () from /usr/lib64/libgtk-x11-2.0.so.0
#25 0x00002b594b81d684 in ?? () from /usr/lib64/python2.4/site-packages/gtk-2.0/gtk/_gtk.so
#26 0x0000003a81296167 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#27 0x0000003a81295e46 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#28 0x0000003a812972c5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0
#29 0x0000003a81297312 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0
#30 0x0000003a812b39f9 in ?? () from /usr/lib64/libpython2.4.so.1.0
#31 0x0000003a812b4ea8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0
#32 0x0000003a812bb33d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
#33 0x0000003a8021d994 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000000000400629 in _start ()


Looks like it is waiting for libvirtd.


Thread 26 (Thread 0x50b15940 (LWP 18696)):
#0  0x0000003a80e0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003a80e101b1 in _L_cond_lock_989 () from /lib64/libpthread.so.0
#2  0x0000003a80e1007f in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#3  0x0000003a80e0af84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x000000333fe399c2 in virCondWait (c=0x2aaab407e368, m=0x2aaab407e340) at util/threads-pthread.c:100
#5  0x000000000046ef16 in qemuMonitorSend (mon=0x2aaab407e340, msg=0x50b14b70) at qemu/qemu_monitor.c:728
#6  0x0000000000472e54 in qemuMonitorCommandWithHandler (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", passwordHandler=0, passwordOpaque=0x0, scm_fd=-1, reply=0x50b14c70)
    at qemu/qemu_monitor_text.c:340
#7  0x0000000000472fcb in qemuMonitorCommandWithFd (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", scm_fd=-1, reply=0x50b14c70) at qemu/qemu_monitor_text.c:374
#8  0x0000000000472ff7 in qemuMonitorCommand (mon=0x2aaab407e340, cmd=0x4c332e "info balloon", reply=0x50b14c70) at qemu/qemu_monitor_text.c:381
#9  0x0000000000473996 in qemuMonitorTextGetBalloonInfo (mon=0x2aaab407e340, currmem=0x50b14d38) at qemu/qemu_monitor_text.c:680
#10 0x000000000046fb4e in qemuMonitorGetBalloonInfo (mon=0x2aaab407e340, currmem=0x50b14d38) at qemu/qemu_monitor.c:1014
#11 0x0000000000440755 in qemudDomainGetInfo (dom=0x1950800, info=0x50b14e20) at qemu/qemu_driver.c:4886
#12 0x000000333fe7ed12 in virDomainGetInfo (domain=0x1950800, info=0x50b14e20) at libvirt.c:3050
#13 0x00000000004216d0 in remoteDispatchDomainGetInfo (server=0x18ea910, client=0x2aaaac001140, conn=0x19aaa80, hdr=0x2aaaaca94c20, rerr=0x50b14f50, args=0x50b14f00,
    ret=0x50b14ea0) at remote.c:1485
#14 0x000000000042b3c3 in remoteDispatchClientCall (server=0x18ea910, client=0x2aaaac001140, msg=0x2aaaaca54c10) at dispatch.c:508
#15 0x000000000042afe8 in remoteDispatchClientRequest (server=0x18ea910, client=0x2aaaac001140, msg=0x2aaaaca54c10) at dispatch.c:390
#16 0x000000000041a4a8 in qemudWorker (data=0x18f0a18) at libvirtd.c:1574
#17 0x0000003a80e0673d in start_thread () from /lib64/libpthread.so.0
#18 0x0000003a802d40cd in clone () from /lib64/libc.so.6


Perhaps libvirtd is blocking on the guest OS?  Querying the mem balloon?  Just a guess.
Comment 15 Michael Closson 2011-08-22 10:15:03 EDT
I let the test run over the weekend.  It stopped working Saturday afternoon.  The agent process in our software that links to libvirt was blocked in a call to virDomainInfo(), which didn't return at all.  I let it run until Monday morning at which time I restarted libvirtd to get things going again.  I check the stack trace before restarting and I think 4 threads were doing "info balloon", just like the stack trace in the previous append.

There are 2 hypervisors in my environment and both stopped processing LSF jobs (and VM requests) because our agent was blocked in virDomainInfo().

The good news is that I didn't observe the VM that was frozen after a resume.

I will disable the balloon module to see if that will prevent virDomainInfo() from hanging and reset the test.
Comment 16 Michael Closson 2011-09-01 12:08:54 EDT
There is another bug that blocks the retest for this one.  The libvirt client locks up in virDomainGetInfo().  I'll log a separate issue to track that one.
Comment 17 Golita Yue 2011-11-14 22:17:30 EST
From comment #10, can confirm this is "migrate exec:dd" action, to repeat migration and load vm will leads to vm stack on 100% cpu. 
According to above comments, this work need long time test. I will write script to test it and update the result when I complete.
Comment 18 Juan Quintela 2011-11-15 05:23:26 EST
Can you look if you can reproduce with only amd/intel hosts?  We don't support migration (save/resume is the same code path) between architectures on RHEL5/6?
Comment 19 Mike Cao 2011-11-16 01:51:32 EST
Could you check the issue only happened when migrate guest from intel host to AMD host or from AMD host to Intel host?

If so ,this is a senario we do not support.

Best Regards,
Mike
Comment 20 Golita Yue 2011-11-16 02:10:42 EST
(In reply to comment #18)
> Can you look if you can reproduce with only amd/intel hosts?  We don't support
> migration (save/resume is the same code path) between architectures on RHEL5/6?

Yes, I am doing test with only amd/intel hosts.
Comment 21 Golita Yue 2011-11-17 01:32:09 EST
Tested migration with guest RHEL5.6-32 & RHEL5.6-64 200 times, didn't hit this bug.

steps:
1. migrate -d "exec:dd of=/tmp/rhel5.6img.test bs=4096 seek=1"
2. boot vm with -incoming "exec:dd if=/tmp/rhel5.6img.test bs=4096 skip=1"

The job link:
Intel-host: https://virtlab.englab.nay.redhat.com/job/41536/details/
AMD-host: https://virtlab.englab.nay.redhat.com/job/41537/details/

The host info:
kernel-2.6.18-294.el5
kvm-83-243.el5

cmd:
/home/autotest-devel/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20111116-130535-rKwp',server,nowait -serial unix:'/tmp/serial-20111116-130535-rKwp',server,nowait -drive file='/home/autotest-devel/client/tests/kvm/images/RHEL-Server-5.6-32.raw',index=0,if=ide,media=disk,cache=none,format=raw -net nic,vlan=0,model=rtl8139,macaddr='9a:86:29:6f:18:a3' -net tap,vlan=0,fd=36 -m 1024 -smp 2,cores=1,threads=1,sockets=2 -cpu qemu64,+sse2 -vnc :1 -rtc-td-hack -boot c   -no-kvm-pit-reinjection  -M rhel5.6.0 -usbdevice tablet -S -incoming "exec:dd if=/tmp/rhel5.6img.test bs=4096 skip=1"
Comment 23 RHEL Product and Program Management 2012-04-02 06:53:38 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.
Comment 25 Ronen Hod 2012-04-09 06:33:14 EDT
Hi Michael Closson,

Thank you for taking the time to enter a bug report with us. We do appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for getting support, and as such we are not able to make any guarantees as to the timeliness or suitability of a resolution.
 
If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain that it gets the proper attention and prioritization to assure a timely resolution. 
 
For information on how to contact the Red Hat production support team, please see:
https://www.redhat.com/support/process/production/#howto

For now, this bug is not reproducible here, so I am closing it for RHEL5.

Thanks, Ronen.

Note You need to log in before you can comment on or make changes to this bug.