Red Hat Bugzilla – Bug 678524
Exec based migration randomly fails, particularly under high load
Last modified: 2013-01-09 18:34:18 EST
Description of problem: The use of 'exec' based migration will randomly return 'status: failed' in the monitor. Migration appears to complete successfully but at the last step QEMU sets it to failed. The problem appears to be non-deterministic, but is made more frequent by introducing high machine load (eg run a huge compile job in another shell). Version-Release number of selected component (if applicable): qemu-kvm-0.12.1.2-2.146.el6 How reproducible: Non-deterministic, more frequently under high load Steps to Reproduce: 1. In one shell, checkout libvirt and leaving this running # while /bin/true ; do make clean ; make -j 8 all ; done 2. In another shell # qemu-kvm -m 500 -cdrom F10-i686-Live.iso -vnc :1 -monitor stdio 3. In the monitor console run (qemu) migrate -d "exec:cat >'/tmp/migtest.save'" 4. Still in the monitor console, repeatedly run until migration completes / fails (qemu) info migrate Actual results: (qemu) info migrate Migration status: failed Expected results: (qemu) info migrate Migration status: success Additional info:
Running the above demo steps with a QEMU under strace I see the following sequence: pipe2([16, 17], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffb2b9a3c50) = 6916 close(16) = 0 fcntl(17, F_SETFD, 0) = 0 fcntl(17, F_GETFL) = 0x1 (flags O_WRONLY) fcntl(17, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 write(17, "QEVM\0\0\0\3\1\0\0\0\3\5block\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\1\0\0\0\4\3ram\0\0\0\0\0\0\0\4\0\0\0\0 F\0\4\6pc.ram\0\0\0\0\37@\0\0\7pc.bios\0\0\0\0\0\2\0\0\6pc.rom\0\0\0\0\0\2\0\0\10vga.vram\0\0\0\0\1\0\.... [snip thousands more write(17...) calls] close(17) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, NULL, WNOHANG, NULL) = 6916 rt_sigreturn(0) = 0 wait4(6916, 0x7fff100e27f0, 0, NULL) = -1 ECHILD (No child processes) write(1, "info migrate\33[K", 15) = 15 write(1, "\r\n", 2) = 2 write(1, "Migration status: failed\r\n", 26) = 26 write(1, "(qemu) ", 7) = 7 The only syscall returning fail is that second wait4() call. Based on this I see in the source code, QEMU has a generic SIGCHLD handler that it uses to reap all processes static void sigchld_handler(int signal) { waitpid(-1, NULL, WNOHANG); } This is clearly responsible for the first (successful) wait4() call seen in strace. The exec based migration uses popen() + qemu_fopen() to start the child process. When migration completes, it calls qemu_fclose(), which calls fclose(). The fclose() impl in glibc for popen() runs a 'waitpid' do { wait_pid = _IO_waitpid (((_IO_proc_file *) fp)->pid, &wstatus, 0); } while (wait_pid == -1 && errno == EINTR); if (wait_pid == -1) return -1; return wstatus; This clearly matches the second wait4() call seen in strace that fails. The return status of fclose() is passed back as return status of qemu_fclose(). If qemu_fclose() fails, then the exec_close() in migration-exec.c returns a error code. This causes migrate_fd_cleanup() to return an error, and thus finally we see why 'status: failed' occurs: if (migrate_fd_cleanup(s) < 0) { if (old_vm_running) { vm_start(); } state = MIG_STATE_ERROR; } Thus exec based migration has a clear race condition wrt the sigchild handler, that will randomly cause failure.
Thanks for the detailed bug. Trying to think how to diferentiate wait exits.
I think you can simply drop the SIGCHLD handler, it is there only for use by SLIRP's Samba support.
Paolo, do you have bandwidth to take over this and free Juan for other migration bugs? RHEV folks reported to it happens pretty frequent for them.
Ok.
By the way, this is not the first time something like this happen. I also dealt with a bug like this in the past, that turned out to be more reasonable fixed by something else - we ended up finding a problem after fork. The bug was https://bugzilla.redhat.com/show_bug.cgi?id=606953, for reference. We do have this patches in RHEL6 (verified), so maybe we need to go back and rethink the use of our sigchld handler in qemu-kvm ?
Created attachment 483271 [details] patch to fix the bug
Reproduced it on qemu-kvm-0.12.1.2-2.135.el6 with following steps 1.Load host by compile kernel in a loop 2.start VM as followings (note memory=2GB can not reproduce) eg:/usr/libexec/qemu-kvm -m 500 -monitor unix:/tmp/tt,rver,nowait -vnc :1 3.migrate to file in a loop cat > tmp <<EOF >migrate "exec:cat > '/tmp/migtest.save'" >EOF for ((;;)); do cat tmp |nc -U /tmp/tt ; echo "info migrate"|nc -U /tmp/tt ; echo "cont" |nc -U /tmp/tt ; done Actual Results: migration failed sometimes Verified on qemu-kvm-0.12.1.2-2.156.el6 with the steps above Actual Results: Tried 100 times ,migrate all passed Based on above ,this issue has been fixed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: qemu-kvm slirp -smb code doesn't "reap" child processes. Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load. Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit. Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,7 +1,7 @@ -Cause: qemu-kvm slirp -smb code doesn't "reap" child processes. +Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler. -Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load. +Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail. -Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit. +Fix: added code that allows the SIGCHLD handler to only reap specific child processes. -Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.+Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html