Bug 678524
Summary: | Exec based migration randomly fails, particularly under high load | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Daniel Berrangé <berrange> | ||||
Component: | qemu-kvm | Assignee: | Paolo Bonzini <pbonzini> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | abaron, bcao, cpelland, danken, ehabkost, gcosta, khong, michen, mkenneth, pbonzini, tburke, virt-maint | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.154.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler.
Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail.
Fix: added code that allows the SIGCHLD handler to only reap specific child processes.
Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-05-19 11:21:49 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 666158, 672725, 681623 | ||||||
Attachments: |
|
Description
Daniel Berrangé
2011-02-18 11:13:40 UTC
Running the above demo steps with a QEMU under strace I see the following sequence: pipe2([16, 17], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffb2b9a3c50) = 6916 close(16) = 0 fcntl(17, F_SETFD, 0) = 0 fcntl(17, F_GETFL) = 0x1 (flags O_WRONLY) fcntl(17, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 write(17, "QEVM\0\0\0\3\1\0\0\0\3\5block\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\1\0\0\0\4\3ram\0\0\0\0\0\0\0\4\0\0\0\0 F\0\4\6pc.ram\0\0\0\0\37@\0\0\7pc.bios\0\0\0\0\0\2\0\0\6pc.rom\0\0\0\0\0\2\0\0\10vga.vram\0\0\0\0\1\0\.... [snip thousands more write(17...) calls] close(17) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, NULL, WNOHANG, NULL) = 6916 rt_sigreturn(0) = 0 wait4(6916, 0x7fff100e27f0, 0, NULL) = -1 ECHILD (No child processes) write(1, "info migrate\33[K", 15) = 15 write(1, "\r\n", 2) = 2 write(1, "Migration status: failed\r\n", 26) = 26 write(1, "(qemu) ", 7) = 7 The only syscall returning fail is that second wait4() call. Based on this I see in the source code, QEMU has a generic SIGCHLD handler that it uses to reap all processes static void sigchld_handler(int signal) { waitpid(-1, NULL, WNOHANG); } This is clearly responsible for the first (successful) wait4() call seen in strace. The exec based migration uses popen() + qemu_fopen() to start the child process. When migration completes, it calls qemu_fclose(), which calls fclose(). The fclose() impl in glibc for popen() runs a 'waitpid' do { wait_pid = _IO_waitpid (((_IO_proc_file *) fp)->pid, &wstatus, 0); } while (wait_pid == -1 && errno == EINTR); if (wait_pid == -1) return -1; return wstatus; This clearly matches the second wait4() call seen in strace that fails. The return status of fclose() is passed back as return status of qemu_fclose(). If qemu_fclose() fails, then the exec_close() in migration-exec.c returns a error code. This causes migrate_fd_cleanup() to return an error, and thus finally we see why 'status: failed' occurs: if (migrate_fd_cleanup(s) < 0) { if (old_vm_running) { vm_start(); } state = MIG_STATE_ERROR; } Thus exec based migration has a clear race condition wrt the sigchild handler, that will randomly cause failure. Thanks for the detailed bug. Trying to think how to diferentiate wait exits. I think you can simply drop the SIGCHLD handler, it is there only for use by SLIRP's Samba support. Paolo, do you have bandwidth to take over this and free Juan for other migration bugs? RHEV folks reported to it happens pretty frequent for them. Ok. By the way, this is not the first time something like this happen. I also dealt with a bug like this in the past, that turned out to be more reasonable fixed by something else - we ended up finding a problem after fork. The bug was https://bugzilla.redhat.com/show_bug.cgi?id=606953, for reference. We do have this patches in RHEL6 (verified), so maybe we need to go back and rethink the use of our sigchld handler in qemu-kvm ? Created attachment 483271 [details]
patch to fix the bug
Reproduced it on qemu-kvm-0.12.1.2-2.135.el6 with following steps
1.Load host by compile kernel in a loop
2.start VM as followings (note memory=2GB can not reproduce)
eg:/usr/libexec/qemu-kvm -m 500 -monitor unix:/tmp/tt,rver,nowait -vnc :1
3.migrate to file in a loop
cat > tmp <<EOF
>migrate "exec:cat > '/tmp/migtest.save'"
>EOF
for ((;;)); do cat tmp |nc -U /tmp/tt ; echo "info migrate"|nc -U /tmp/tt ; echo "cont" |nc -U /tmp/tt ; done
Actual Results:
migration failed sometimes
Verified on qemu-kvm-0.12.1.2-2.156.el6 with the steps above
Actual Results:
Tried 100 times ,migrate all passed
Based on above ,this issue has been fixed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: qemu-kvm slirp -smb code doesn't "reap" child processes. Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load. Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit. Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,7 +1,7 @@ -Cause: qemu-kvm slirp -smb code doesn't "reap" child processes. +Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler. -Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load. +Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail. -Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit. +Fix: added code that allows the SIGCHLD handler to only reap specific child processes. -Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.+Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html |