This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 985334 - query mem info from monitor would cause qemu-kvm hang [RHEL-6.5]
query mem info from monitor would cause qemu-kvm hang [RHEL-6.5]
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.5
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Laszlo Ersek
Virtualization Bugs
:
Depends On: 909059
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-17 06:09 EDT by Laszlo Ersek
Modified: 2013-12-05 05:03 EST (History)
12 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.385.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 970047
Environment:
Last Closed: 2013-11-21 02:04:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 Qunfang Zhang 2013-07-18 02:51:33 EDT
Hi, Laszlo

I can not reproduce this bug on rhel6.5 host.  According to the original reporter xwei, he did not reproduce on rhel6 host but can reproduce on rhel7 host. 
Do you mean there's problem in the rhel6.5 code as well so clone it? Could you give us some suggestion on how to verify it in future? 


Thanks,
Qunfang
Comment 3 Laszlo Ersek 2013-07-18 07:52:00 EDT
Honestly, I don't know. Remember that I could not reproduce this problem on RHEL-7 either.

The bug is caused by a huge number of monitor_flush() calls (for example incurred by the "info tlb" HMP command) so that most of these monitor_flush() calls actually fail to flush the monitor data to stdout, they are instead forced to queue the data. This depends very strongly on scheduling.

Can you maybe try this:

Step I:

(1) on terminal A: 

  mkfifo test.fifo

(2) on terminal B:

  sleep 1000 < test.fifo

(3) on terminal A:

  qemu [usual command line options, including -monitor stdio] \
  | tee test.fifo

  (monitor) cont
  (monitor) info tlb [repeatedly]

The idea is, "tee" will write the monitor output to both the terminal and to the fifo. Now the fifo is open for reading by "sleep", but "sleep" won't actually read data. Hence, once the FIFO is full (4KB), "tee" should block. After further 4KB of data (in total, 8KB) the pipe between "qemu" and "tee" should be full as well ("tee" being blocked), and qemu / monitor_flush() should start running into the situation described above.

Step II:

Unfortunately Step I. in itself is still not enough to reproduce the bug. The above suffices to create some extra watches for stdout-readiness notification, but we need not just "some", but so many of them, that g_poll() fails in the main loop with -1/EINVAL.

You can confirm that by witnessing the same hang as reported for RHEL-7 (actually, it's not a hang, the IO thread is spinning without progress).

If you manage to do Step I only (ie. the monitor output on the terminal stops, but the guest actually remains responsive via VNC or ssh), then please force qemu-kvm to dump core (*), and hopefully I'll be able to verify the problem by looking at it.

(*) Make sure you have core dumps enabled with ulimit, and send qemu-kvm a SIGABRT with "kill" -- in theory Ctrl-\ (= Ctrl-BackSlash), ie. an interactive SIGQUIT should work too, but maybe qemu catches it, I'm not sure.
Comment 4 Laszlo Ersek 2013-08-06 08:17:17 EDT
I found a way to reproduce this bug in RHEL-6.

(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

(7) At one point, the qemu-kvm process in terminal A dies, with the following message:

ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
Aborted
Comment 5 Laszlo Ersek 2013-08-06 08:36:12 EDT
(In reply to Laszlo Ersek from comment #4)

> (6) In terminal C, repeat the following command indefinitely (it's simples
> to keep pasting it from the clipboard):
> 
>   info registers
> 
> (7) At one point, the qemu-kvm process in terminal A dies, with the
> following message:
> 
> ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill:
> assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
> Aborted

In my testing, 128 "info registers" commands issued in step (6) are sufficient to trigger the bug.
Comment 11 Qunfang Zhang 2013-08-09 05:07:25 EDT
Reproduced this bug on qemu-kvm-0.12.1.2-2.382.el6 and verified pass on qemu-kvm-0.12.1.2-2.385.el6.

Steps:

(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

======================

Result:

On old qemu-kvm-0.12.1.2-2.382.el6, qemu process in terminal A died at the 90th "info registers" attempt and prompt:

[root@t2 home]# /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo -mon chardev=fifo,default
VNC server running on `::1:5900'

**
ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))
Aborted (core dumped)
[root@t2 home]# 


On fixed qemu-kvm-0.12.1.2-2.385.el6, qemu process does not die after 300 times "info registers" input.

So, this issue is fixed.
Comment 13 errata-xmlrpc 2013-11-21 02:04:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html

Note You need to log in before you can comment on or make changes to this bug.