Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 678524 - Exec based migration randomly fails, particularly under high load
Exec based migration randomly fails, particularly under high load
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.1
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Paolo Bonzini
Virtualization Bugs
: Triaged
Depends On:
Blocks: 666158 672725 681623
  Show dependency treegraph
 
Reported: 2011-02-18 06:13 EST by Daniel Berrange
Modified: 2013-01-09 18:34 EST (History)
12 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.154.el6
Doc Type: Bug Fix
Doc Text:
Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler. Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail. Fix: added code that allows the SIGCHLD handler to only reap specific child processes. Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 07:21:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to fix the bug (5.72 KB, patch)
2011-03-09 12:40 EST, Paolo Bonzini
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0534 normal SHIPPED_LIVE Important: qemu-kvm security, bug fix, and enhancement update 2011-05-19 07:20:36 EDT

  None (edit)
Description Daniel Berrange 2011-02-18 06:13:40 EST
Description of problem:
The use of 'exec' based migration will randomly return 'status: failed' in the monitor. Migration appears to complete successfully but at the last step QEMU sets it to failed. The problem appears to be non-deterministic, but is made more frequent by introducing high machine load (eg run a huge compile job in another shell).

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.146.el6

How reproducible:
Non-deterministic, more frequently under high load

Steps to Reproduce:
1. In one shell, checkout libvirt and leaving this running

   # while /bin/true ; do make clean ; make -j 8 all ; done

2. In another shell

   # qemu-kvm  -m 500 -cdrom F10-i686-Live.iso -vnc :1 -monitor stdio

3. In the monitor console run

  (qemu) migrate -d "exec:cat >'/tmp/migtest.save'"
 
4. Still in the monitor console, repeatedly run until migration completes / fails

  (qemu) info migrate

  
Actual results:
(qemu) info migrate
Migration status: failed


Expected results:
(qemu) info migrate
Migration status: success


Additional info:
Comment 2 Daniel Berrange 2011-02-18 06:26:07 EST
Running the above demo steps with a QEMU under strace I see the following sequence:

pipe2([16, 17], O_CLOEXEC)              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffb2b9a3c50) = 6916
close(16)                               = 0
fcntl(17, F_SETFD, 0)                   = 0
fcntl(17, F_GETFL)                      = 0x1 (flags O_WRONLY)
fcntl(17, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
write(17, "QEVM\0\0\0\3\1\0\0\0\3\5block\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\1\0\0\0\4\3ram\0\0\0\0\0\0\0\4\0\0\0\0 F\0\4\6pc.ram\0\0\0\0\37@\0\0\7pc.bios\0\0\0\0\0\2\0\0\6pc.rom\0\0\0\0\0\2\0\0\10vga.vram\0\0\0\0\1\0\....

[snip thousands more write(17...) calls]

close(17)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, NULL, WNOHANG, NULL)          = 6916
rt_sigreturn(0)                         = 0
wait4(6916, 0x7fff100e27f0, 0, NULL)    = -1 ECHILD (No child processes)


write(1, "info migrate\33[K", 15)       = 15
write(1, "\r\n", 2)                     = 2
write(1, "Migration status: failed\r\n", 26) = 26
write(1, "(qemu) ", 7)                  = 7


The only syscall returning fail is that second wait4()  call.

Based on this I see in the source code, QEMU has a generic SIGCHLD handler that it uses to reap all processes

static void sigchld_handler(int signal)
{
    waitpid(-1, NULL, WNOHANG);
}

This is clearly responsible for the first (successful)  wait4() call seen in strace.

The exec based migration uses  popen() + qemu_fopen() to start the child process. When migration completes, it calls qemu_fclose(), which calls fclose(). The fclose() impl in glibc for popen() runs a 'waitpid'


  do
    {
      wait_pid = _IO_waitpid (((_IO_proc_file *) fp)->pid, &wstatus, 0);
    }
  while (wait_pid == -1 && errno == EINTR);
  if (wait_pid == -1)
    return -1;
  return wstatus;

This clearly matches the second wait4() call seen in strace that fails.

The return status of fclose() is passed back as return status of qemu_fclose(). If qemu_fclose() fails, then the exec_close() in migration-exec.c returns a error code. This causes migrate_fd_cleanup() to return an error, and thus finally we see why 'status: failed' occurs:

        if (migrate_fd_cleanup(s) < 0) {
            if (old_vm_running) {
                vm_start();
            }
            state = MIG_STATE_ERROR;
        }


Thus exec based migration has a clear race condition wrt the sigchild handler, that will randomly cause failure.
Comment 3 Juan Quintela 2011-02-18 06:35:54 EST
Thanks for the detailed bug.  Trying to think how to diferentiate wait exits.
Comment 4 Paolo Bonzini 2011-03-02 02:48:35 EST
I think you can simply drop the SIGCHLD handler, it is there only for use by SLIRP's Samba support.
Comment 5 Dor Laor 2011-03-02 03:22:30 EST
Paolo, do you have bandwidth to take over this and free Juan for other migration bugs? RHEV folks reported to it happens pretty frequent for them.
Comment 7 Paolo Bonzini 2011-03-03 07:22:53 EST
Ok.
Comment 8 Glauber Costa 2011-03-04 14:34:39 EST
By the way, this is not the first time something like this happen.

I also dealt with a bug like this in the past, that turned out to be more reasonable fixed by something else - we ended up finding a problem after fork. 

The bug was https://bugzilla.redhat.com/show_bug.cgi?id=606953, for reference.

We do have this patches in RHEL6 (verified), so maybe we need to go back and rethink the use of our sigchld handler in qemu-kvm ?
Comment 9 Paolo Bonzini 2011-03-09 12:40:48 EST
Created attachment 483271 [details]
patch to fix the bug
Comment 17 Mike Cao 2011-04-11 01:03:31 EDT
Reproduced it on qemu-kvm-0.12.1.2-2.135.el6 with following steps

1.Load host by compile kernel in a loop
2.start VM as followings (note memory=2GB can not reproduce)
eg:/usr/libexec/qemu-kvm -m 500 -monitor unix:/tmp/tt,rver,nowait -vnc :1
3.migrate to file in a loop
cat > tmp <<EOF
>migrate "exec:cat > '/tmp/migtest.save'"
>EOF
for ((;;)); do cat tmp |nc -U /tmp/tt ; echo "info migrate"|nc -U /tmp/tt ; echo "cont" |nc -U /tmp/tt ; done
Actual Results:
migration failed sometimes

Verified on qemu-kvm-0.12.1.2-2.156.el6 with the steps above
Actual Results:
Tried 100 times ,migrate all passed

Based on above ,this issue has been fixed.
Comment 18 Eduardo Habkost 2011-05-05 08:19:13 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.

Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.

Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.

Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.
Comment 19 Paolo Bonzini 2011-05-05 09:22:09 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1,7 @@
-Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.
+Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler.
 
-Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.
+Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process.  If the SIGCHLD handler comes first, VM save will fail.
 
-Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.
+Fix: added code that allows the SIGCHLD handler to only reap specific child processes.
 
-Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.+Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
Comment 21 errata-xmlrpc 2011-05-19 07:21:49 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html
Comment 22 errata-xmlrpc 2011-05-19 09:01:47 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Note You need to log in before you can comment on or make changes to this bug.