Bug 678524

Summary: Exec based migration randomly fails, particularly under high load
Product: Red Hat Enterprise Linux 6 Reporter: Daniel Berrangé <berrange>
Component: qemu-kvmAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, bcao, cpelland, danken, ehabkost, gcosta, khong, michen, mkenneth, pbonzini, tburke, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.154.el6 Doc Type: Bug Fix
Doc Text:
Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler. Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail. Fix: added code that allows the SIGCHLD handler to only reap specific child processes. Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 11:21:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 666158, 672725, 681623    
Attachments:
Description Flags
patch to fix the bug none

Description Daniel Berrangé 2011-02-18 11:13:40 UTC
Description of problem:
The use of 'exec' based migration will randomly return 'status: failed' in the monitor. Migration appears to complete successfully but at the last step QEMU sets it to failed. The problem appears to be non-deterministic, but is made more frequent by introducing high machine load (eg run a huge compile job in another shell).

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.146.el6

How reproducible:
Non-deterministic, more frequently under high load

Steps to Reproduce:
1. In one shell, checkout libvirt and leaving this running

   # while /bin/true ; do make clean ; make -j 8 all ; done

2. In another shell

   # qemu-kvm  -m 500 -cdrom F10-i686-Live.iso -vnc :1 -monitor stdio

3. In the monitor console run

  (qemu) migrate -d "exec:cat >'/tmp/migtest.save'"
 
4. Still in the monitor console, repeatedly run until migration completes / fails

  (qemu) info migrate

  
Actual results:
(qemu) info migrate
Migration status: failed


Expected results:
(qemu) info migrate
Migration status: success


Additional info:

Comment 2 Daniel Berrangé 2011-02-18 11:26:07 UTC
Running the above demo steps with a QEMU under strace I see the following sequence:

pipe2([16, 17], O_CLOEXEC)              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffb2b9a3c50) = 6916
close(16)                               = 0
fcntl(17, F_SETFD, 0)                   = 0
fcntl(17, F_GETFL)                      = 0x1 (flags O_WRONLY)
fcntl(17, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
write(17, "QEVM\0\0\0\3\1\0\0\0\3\5block\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\1\0\0\0\4\3ram\0\0\0\0\0\0\0\4\0\0\0\0 F\0\4\6pc.ram\0\0\0\0\37@\0\0\7pc.bios\0\0\0\0\0\2\0\0\6pc.rom\0\0\0\0\0\2\0\0\10vga.vram\0\0\0\0\1\0\....

[snip thousands more write(17...) calls]

close(17)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, NULL, WNOHANG, NULL)          = 6916
rt_sigreturn(0)                         = 0
wait4(6916, 0x7fff100e27f0, 0, NULL)    = -1 ECHILD (No child processes)


write(1, "info migrate\33[K", 15)       = 15
write(1, "\r\n", 2)                     = 2
write(1, "Migration status: failed\r\n", 26) = 26
write(1, "(qemu) ", 7)                  = 7


The only syscall returning fail is that second wait4()  call.

Based on this I see in the source code, QEMU has a generic SIGCHLD handler that it uses to reap all processes

static void sigchld_handler(int signal)
{
    waitpid(-1, NULL, WNOHANG);
}

This is clearly responsible for the first (successful)  wait4() call seen in strace.

The exec based migration uses  popen() + qemu_fopen() to start the child process. When migration completes, it calls qemu_fclose(), which calls fclose(). The fclose() impl in glibc for popen() runs a 'waitpid'


  do
    {
      wait_pid = _IO_waitpid (((_IO_proc_file *) fp)->pid, &wstatus, 0);
    }
  while (wait_pid == -1 && errno == EINTR);
  if (wait_pid == -1)
    return -1;
  return wstatus;

This clearly matches the second wait4() call seen in strace that fails.

The return status of fclose() is passed back as return status of qemu_fclose(). If qemu_fclose() fails, then the exec_close() in migration-exec.c returns a error code. This causes migrate_fd_cleanup() to return an error, and thus finally we see why 'status: failed' occurs:

        if (migrate_fd_cleanup(s) < 0) {
            if (old_vm_running) {
                vm_start();
            }
            state = MIG_STATE_ERROR;
        }


Thus exec based migration has a clear race condition wrt the sigchild handler, that will randomly cause failure.

Comment 3 Juan Quintela 2011-02-18 11:35:54 UTC
Thanks for the detailed bug.  Trying to think how to diferentiate wait exits.

Comment 4 Paolo Bonzini 2011-03-02 07:48:35 UTC
I think you can simply drop the SIGCHLD handler, it is there only for use by SLIRP's Samba support.

Comment 5 Dor Laor 2011-03-02 08:22:30 UTC
Paolo, do you have bandwidth to take over this and free Juan for other migration bugs? RHEV folks reported to it happens pretty frequent for them.

Comment 7 Paolo Bonzini 2011-03-03 12:22:53 UTC
Ok.

Comment 8 Glauber Costa 2011-03-04 19:34:39 UTC
By the way, this is not the first time something like this happen.

I also dealt with a bug like this in the past, that turned out to be more reasonable fixed by something else - we ended up finding a problem after fork. 

The bug was https://bugzilla.redhat.com/show_bug.cgi?id=606953, for reference.

We do have this patches in RHEL6 (verified), so maybe we need to go back and rethink the use of our sigchld handler in qemu-kvm ?

Comment 9 Paolo Bonzini 2011-03-09 17:40:48 UTC
Created attachment 483271 [details]
patch to fix the bug

Comment 17 Mike Cao 2011-04-11 05:03:31 UTC
Reproduced it on qemu-kvm-0.12.1.2-2.135.el6 with following steps

1.Load host by compile kernel in a loop
2.start VM as followings (note memory=2GB can not reproduce)
eg:/usr/libexec/qemu-kvm -m 500 -monitor unix:/tmp/tt,rver,nowait -vnc :1
3.migrate to file in a loop
cat > tmp <<EOF
>migrate "exec:cat > '/tmp/migtest.save'"
>EOF
for ((;;)); do cat tmp |nc -U /tmp/tt ; echo "info migrate"|nc -U /tmp/tt ; echo "cont" |nc -U /tmp/tt ; done
Actual Results:
migration failed sometimes

Verified on qemu-kvm-0.12.1.2-2.156.el6 with the steps above
Actual Results:
Tried 100 times ,migrate all passed

Based on above ,this issue has been fixed.

Comment 18 Eduardo Habkost 2011-05-05 12:19:13 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.

Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.

Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.

Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.

Comment 19 Paolo Bonzini 2011-05-05 13:22:09 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1,7 @@
-Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.
+Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler.
 
-Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.
+Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process.  If the SIGCHLD handler comes first, VM save will fail.
 
-Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.
+Fix: added code that allows the SIGCHLD handler to only reap specific child processes.
 
-Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.+Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.

Comment 21 errata-xmlrpc 2011-05-19 11:21:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 22 errata-xmlrpc 2011-05-19 13:01:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html