678524 – Exec based migration randomly fails, particularly under high load

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 678524 - Exec based migration randomly fails, particularly under high load

Summary: Exec based migration randomly fails, particularly under high load

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Paolo Bonzini
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	666158 672725 681623
TreeView+	depends on / blocked

Reported:	2011-02-18 11:13 UTC by Daniel Berrangé
Modified:	2018-11-14 16:29 UTC (History)
CC List:	12 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.154.el6
Doc Type:	Bug Fix
Doc Text:	Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler. Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process. If the SIGCHLD handler comes first, VM save will fail. Fix: added code that allows the SIGCHLD handler to only reap specific child processes. Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.
Clone Of:
Environment:
Last Closed:	2011-05-19 11:21:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to fix the bug (5.72 KB, patch) 2011-03-09 17:40 UTC, Paolo Bonzini	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0534	0	normal	SHIPPED_LIVE	Important: qemu-kvm security, bug fix, and enhancement update	2011-05-19 11:20:36 UTC

Description Daniel Berrangé 2011-02-18 11:13:40 UTC

Description of problem:
The use of 'exec' based migration will randomly return 'status: failed' in the monitor. Migration appears to complete successfully but at the last step QEMU sets it to failed. The problem appears to be non-deterministic, but is made more frequent by introducing high machine load (eg run a huge compile job in another shell).

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.146.el6

How reproducible:
Non-deterministic, more frequently under high load

Steps to Reproduce:
1. In one shell, checkout libvirt and leaving this running

   # while /bin/true ; do make clean ; make -j 8 all ; done

2. In another shell

   # qemu-kvm  -m 500 -cdrom F10-i686-Live.iso -vnc :1 -monitor stdio

3. In the monitor console run

  (qemu) migrate -d "exec:cat >'/tmp/migtest.save'"
 
4. Still in the monitor console, repeatedly run until migration completes / fails

  (qemu) info migrate

  
Actual results:
(qemu) info migrate
Migration status: failed


Expected results:
(qemu) info migrate
Migration status: success


Additional info:

Comment 2 Daniel Berrangé 2011-02-18 11:26:07 UTC

Running the above demo steps with a QEMU under strace I see the following sequence:

pipe2([16, 17], O_CLOEXEC)              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffb2b9a3c50) = 6916
close(16)                               = 0
fcntl(17, F_SETFD, 0)                   = 0
fcntl(17, F_GETFL)                      = 0x1 (flags O_WRONLY)
fcntl(17, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
write(17, "QEVM\0\0\0\3\1\0\0\0\3\5block\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\1\0\0\0\4\3ram\0\0\0\0\0\0\0\4\0\0\0\0 F\0\4\6pc.ram\0\0\0\0\37@\0\0\7pc.bios\0\0\0\0\0\2\0\0\6pc.rom\0\0\0\0\0\2\0\0\10vga.vram\0\0\0\0\1\0\....

[snip thousands more write(17...) calls]

close(17)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, NULL, WNOHANG, NULL)          = 6916
rt_sigreturn(0)                         = 0
wait4(6916, 0x7fff100e27f0, 0, NULL)    = -1 ECHILD (No child processes)


write(1, "info migrate\33[K", 15)       = 15
write(1, "\r\n", 2)                     = 2
write(1, "Migration status: failed\r\n", 26) = 26
write(1, "(qemu) ", 7)                  = 7


The only syscall returning fail is that second wait4()  call.

Based on this I see in the source code, QEMU has a generic SIGCHLD handler that it uses to reap all processes

static void sigchld_handler(int signal)
{
    waitpid(-1, NULL, WNOHANG);
}

This is clearly responsible for the first (successful)  wait4() call seen in strace.

The exec based migration uses  popen() + qemu_fopen() to start the child process. When migration completes, it calls qemu_fclose(), which calls fclose(). The fclose() impl in glibc for popen() runs a 'waitpid'


  do
    {
      wait_pid = _IO_waitpid (((_IO_proc_file *) fp)->pid, &wstatus, 0);
    }
  while (wait_pid == -1 && errno == EINTR);
  if (wait_pid == -1)
    return -1;
  return wstatus;

This clearly matches the second wait4() call seen in strace that fails.

The return status of fclose() is passed back as return status of qemu_fclose(). If qemu_fclose() fails, then the exec_close() in migration-exec.c returns a error code. This causes migrate_fd_cleanup() to return an error, and thus finally we see why 'status: failed' occurs:

        if (migrate_fd_cleanup(s) < 0) {
            if (old_vm_running) {
                vm_start();
            }
            state = MIG_STATE_ERROR;
        }


Thus exec based migration has a clear race condition wrt the sigchild handler, that will randomly cause failure.

Comment 3 Juan Quintela 2011-02-18 11:35:54 UTC

Thanks for the detailed bug.  Trying to think how to diferentiate wait exits.

Comment 4 Paolo Bonzini 2011-03-02 07:48:35 UTC

I think you can simply drop the SIGCHLD handler, it is there only for use by SLIRP's Samba support.

Comment 5 Dor Laor 2011-03-02 08:22:30 UTC

Paolo, do you have bandwidth to take over this and free Juan for other migration bugs? RHEV folks reported to it happens pretty frequent for them.

Comment 7 Paolo Bonzini 2011-03-03 12:22:53 UTC

Ok.

Comment 8 Glauber Costa 2011-03-04 19:34:39 UTC

By the way, this is not the first time something like this happen.

I also dealt with a bug like this in the past, that turned out to be more reasonable fixed by something else - we ended up finding a problem after fork. 

The bug was https://bugzilla.redhat.com/show_bug.cgi?id=606953, for reference.

We do have this patches in RHEL6 (verified), so maybe we need to go back and rethink the use of our sigchld handler in qemu-kvm ?

Comment 9 Paolo Bonzini 2011-03-09 17:40:48 UTC

Created attachment 483271 [details]
patch to fix the bug

Comment 17 Mike Cao 2011-04-11 05:03:31 UTC

Reproduced it on qemu-kvm-0.12.1.2-2.135.el6 with following steps

1.Load host by compile kernel in a loop
2.start VM as followings (note memory=2GB can not reproduce)
eg:/usr/libexec/qemu-kvm -m 500 -monitor unix:/tmp/tt,rver,nowait -vnc :1
3.migrate to file in a loop
cat > tmp <<EOF
>migrate "exec:cat > '/tmp/migtest.save'"
>EOF
for ((;;)); do cat tmp |nc -U /tmp/tt ; echo "info migrate"|nc -U /tmp/tt ; echo "cont" |nc -U /tmp/tt ; done
Actual Results:
migration failed sometimes

Verified on qemu-kvm-0.12.1.2-2.156.el6 with the steps above
Actual Results:
Tried 100 times ,migrate all passed

Based on above ,this issue has been fixed.

Comment 18 Eduardo Habkost 2011-05-05 12:19:13 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.

Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.

Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.

Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.

Comment 19 Paolo Bonzini 2011-05-05 13:22:09 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1,7 @@
-Cause: qemu-kvm slirp -smb code doesn't "reap" child processes.
+Cause: all child processes without distinction are "reaped" by qemu-kvm's SIGCHLD handler.
 
-Consequence: unexpected child processes trigger the qemu-kvm SIGCHLD handler, exec based migration randomly fails, particularly under high load.
+Consequence: when a VM is being saved, both the VM save code and qemu-kvm's SIGCHLD handler try to "reap" the process.  If the SIGCHLD handler comes first, VM save will fail.
 
-Fix: added code that allows the slirp code to register the children PIDs and let qemu-kvm reap them when they exit.
+Fix: added code that allows the SIGCHLD handler to only reap specific child processes.
 
-Result: qemu-kvm SIGCHLD handler will handle the child processes created by the slirp code, fixing exec migration.+Result: qemu-kvm SIGCHLD handler will not anymore try to reap processes created from the VM save operations, so that these will not fail.

Comment 21 errata-xmlrpc 2011-05-19 11:21:49 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 22 errata-xmlrpc 2011-05-19 13:01:47 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Note You need to log in before you can comment on or make changes to this bug.