Description of problem: qemu seem to enter some kind of dead lock when ever I performed off-line migration (suspend, save to disk) on specific host. when I try to connect vmid.monitor.socker via NC it doesn't respond (qemu shell doesn't show). attaching gdb to qemu-kvm and problematic process id shows the following lock: (gdb) info threads 4 Thread 0x4270c940 (LWP 31014) 0x0000003834631744 in do_sigwaitinfo () from /lib64/libc.so.6 3 Thread 0x4310d940 (LWP 31053) 0x0000003834e0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x41907940 (LWP 31056) 0x0000003834631744 in do_sigwaitinfo () from /lib64/libc.so.6 * 1 Thread 0x2ba1f6b0cfa0 (LWP 31013) 0x0000003834e0d89b in write () from /lib64/libpthread.so.0 (gdb) thread 3: gdb) where #0 0x0000003834e0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003834e08e1a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003834e08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000000004fef44 in kvm_main_loop_wait (env=0x643eb90, timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:257 #4 0x00000000004ff5a4 in kvm_main_loop_cpu (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:392 #5 ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443 #6 0x0000003834e0673d in start_thread () from /lib64/libpthread.so.0 #7 0x00000038346d3d1d in clone () from /lib64/libc.so.6 thread 4 (gdb) where #0 0x0000003834631744 in do_sigwaitinfo () from /lib64/libc.so.6 #1 0x00000038346317fd in sigwaitinfo () from /lib64/libc.so.6 #2 0x000000000041a3c1 in sigwait_compat (opaque=<value optimized out>) at compatfd.c:38 #3 0x0000003834e0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000038346d3d1d in clone () from /lib64/libc.so.6 I am sure more can be reviled by looking the problematic host, so please ask for more information. reproduction: 100% on specific host runs kvm-83-164.el5 Steps to Reproduce: 1. suspend vm (save to disk, migration) 2. qemu is not responding Additional info:
please not that same operation succeeds over other hosts running same version of kvm and kernel 2.6.18-194. also - operation was performed from rhev-m --> vdsm --> kvm
backtrace of thread 1: 0 0x0000003834e0d89b in write () from /lib64/libpthread.so.0 #1 0x0000000000473bdc in file_write (s=<value optimized out>, buf=0x1d2f9058, size=20480) at migration-exec.c:42 #2 0x000000000046afaf in migrate_fd_put_buffer (opaque=0x1d04dd40, data=0x1d2f9058, size=20480) at migration.c:211 #3 0x000000000049bd2d in buffered_put_buffer (opaque=0x1d2d8d40, buf=0x1d2f6058 "\275 \377\377\377\213\301\301\351\002\363\245\213È\341\003\363\244\203M\374\377\213\205 \377\377\377\211\205\020\377\377\377\213\205\024\377\377\377\203\300\003\203\340\374\001\205 \377\377\377\213\205 \377\377\377\211\205", pos=<value optimized out>, size=32768) at buffered_file.c:134 #4 0x0000000000471b38 in qemu_fflush (f=0x1d2f6010) at savevm.c:419 #5 0x0000000000472e95 in qemu_put_buffer (f=0x1d2f6010, buf=0x2b1d8c5fc33a "\003~\034\003^ \213E\374\353\337_^[\311\302\004", size=3270) at savevm.c:482 #6 0x0000000000408af8 in ram_save_block (f=0x1d2f6010) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:3358 #7 0x0000000000408b6c in ram_save_live (f=0x1d2f6010, stage=2, opaque=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:3427 #8 0x0000000000472cea in qemu_savevm_state_iterate (f=0x1d2f6010) at savevm.c:768 #9 0x000000000046b09c in migrate_fd_put_ready (opaque=<value optimized out>) at migration.c:256 #10 0x00000000004071bc in qemu_run_timers (ptimer_head=0xb38e00, current_time=158911986) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:1271 #11 0x0000000000409577 in main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4021 #12 0x00000000004ff1ea in kvm_main_loop () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:596 #13 0x000000000040e425 in main_loop (argc=43, argv=0x7fff8cc0c588, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4040 #14 main (argc=43, argv=0x7fff8cc0c588, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:6476
The process is not really stuck. Seams that qemu_run_timers() always calls migrate_fd_put_ready() so nothing else has chances to run.
Reproduced, and should be fixed in next released KVM.
(In reply to comment #1) > please not that same operation succeeds over other hosts running same version > of kvm and kernel 2.6.18-194. > > also - operation was performed from rhev-m --> vdsm --> kvm Hi, Harm you mean some specific host can trigger it .could you supply me the host info so that I can reproduce it ? thanks Mike
(In reply to comment #9) > (In reply to comment #1) > > please not that same operation succeeds over other hosts running same version > > of kvm and kernel 2.6.18-194. > > > > also - operation was performed from rhev-m --> vdsm --> kvm > > > Hi, Harm > > you mean some specific host can trigger it .could you supply me the host info > so that I can reproduce it ? > > thanks > Mike Mike, bug was opened long time ago, which means that I don't have the exact host, and information. please see Juan comment - he managed to reproduced, maybe you can ask him. other then that, bug was surly fixed, as we run lots of migration\suspend testing (regression) on rhel5.x, and no one in our group came across it lately. this is up to you.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0028.html