Bug 985334

Summary: query mem info from monitor would cause qemu-kvm hang [RHEL-6.5]
Product: Red Hat Enterprise Linux 6 Reporter: Laszlo Ersek <lersek>
Component: qemu-kvmAssignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: acathrow, amit.shah, bsarathy, hhuang, juzhang, lersek, michen, mkenneth, qzhang, shuang, virt-maint, xwei
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.385.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 970047 Environment:
Last Closed: 2013-11-21 07:04:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 909059    
Bug Blocks:    

Comment 2 Qunfang Zhang 2013-07-18 06:51:33 UTC
Hi, Laszlo

I can not reproduce this bug on rhel6.5 host.  According to the original reporter xwei, he did not reproduce on rhel6 host but can reproduce on rhel7 host. 
Do you mean there's problem in the rhel6.5 code as well so clone it? Could you give us some suggestion on how to verify it in future? 


Thanks,
Qunfang

Comment 3 Laszlo Ersek 2013-07-18 11:52:00 UTC
Honestly, I don't know. Remember that I could not reproduce this problem on RHEL-7 either.

The bug is caused by a huge number of monitor_flush() calls (for example incurred by the "info tlb" HMP command) so that most of these monitor_flush() calls actually fail to flush the monitor data to stdout, they are instead forced to queue the data. This depends very strongly on scheduling.

Can you maybe try this:

Step I:

(1) on terminal A: 

  mkfifo test.fifo

(2) on terminal B:

  sleep 1000 < test.fifo

(3) on terminal A:

  qemu [usual command line options, including -monitor stdio] \
  | tee test.fifo

  (monitor) cont
  (monitor) info tlb [repeatedly]

The idea is, "tee" will write the monitor output to both the terminal and to the fifo. Now the fifo is open for reading by "sleep", but "sleep" won't actually read data. Hence, once the FIFO is full (4KB), "tee" should block. After further 4KB of data (in total, 8KB) the pipe between "qemu" and "tee" should be full as well ("tee" being blocked), and qemu / monitor_flush() should start running into the situation described above.

Step II:

Unfortunately Step I. in itself is still not enough to reproduce the bug. The above suffices to create some extra watches for stdout-readiness notification, but we need not just "some", but so many of them, that g_poll() fails in the main loop with -1/EINVAL.

You can confirm that by witnessing the same hang as reported for RHEL-7 (actually, it's not a hang, the IO thread is spinning without progress).

If you manage to do Step I only (ie. the monitor output on the terminal stops, but the guest actually remains responsive via VNC or ssh), then please force qemu-kvm to dump core (*), and hopefully I'll be able to verify the problem by looking at it.

(*) Make sure you have core dumps enabled with ulimit, and send qemu-kvm a SIGABRT with "kill" -- in theory Ctrl-\ (= Ctrl-BackSlash), ie. an interactive SIGQUIT should work too, but maybe qemu catches it, I'm not sure.

Comment 4 Laszlo Ersek 2013-08-06 12:17:17 UTC
I found a way to reproduce this bug in RHEL-6.

(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

(7) At one point, the qemu-kvm process in terminal A dies, with the following message:

ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
Aborted

Comment 5 Laszlo Ersek 2013-08-06 12:36:12 UTC
(In reply to Laszlo Ersek from comment #4)

> (6) In terminal C, repeat the following command indefinitely (it's simples
> to keep pasting it from the clipboard):
> 
>   info registers
> 
> (7) At one point, the qemu-kvm process in terminal A dies, with the
> following message:
> 
> ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill:
> assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
> Aborted

In my testing, 128 "info registers" commands issued in step (6) are sufficient to trigger the bug.

Comment 11 Qunfang Zhang 2013-08-09 09:07:25 UTC
Reproduced this bug on qemu-kvm-0.12.1.2-2.382.el6 and verified pass on qemu-kvm-0.12.1.2-2.385.el6.

Steps:

(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

======================

Result:

On old qemu-kvm-0.12.1.2-2.382.el6, qemu process in terminal A died at the 90th "info registers" attempt and prompt:

[root@t2 home]# /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo -mon chardev=fifo,default
VNC server running on `::1:5900'

**
ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
Aborted (core dumped)
[root@t2 home]# 


On fixed qemu-kvm-0.12.1.2-2.385.el6, qemu process does not die after 300 times "info registers" input.

So, this issue is fixed.

Comment 13 errata-xmlrpc 2013-11-21 07:04:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html