Bug 985334
Summary: | query mem info from monitor would cause qemu-kvm hang [RHEL-6.5] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Laszlo Ersek <lersek> |
Component: | qemu-kvm | Assignee: | Laszlo Ersek <lersek> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.5 | CC: | acathrow, amit.shah, bsarathy, hhuang, juzhang, lersek, michen, mkenneth, qzhang, shuang, virt-maint, xwei |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-0.12.1.2-2.385.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 970047 | Environment: | |
Last Closed: | 2013-11-21 07:04:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 909059 | ||
Bug Blocks: |
Comment 2
Qunfang Zhang
2013-07-18 06:51:33 UTC
Honestly, I don't know. Remember that I could not reproduce this problem on RHEL-7 either. The bug is caused by a huge number of monitor_flush() calls (for example incurred by the "info tlb" HMP command) so that most of these monitor_flush() calls actually fail to flush the monitor data to stdout, they are instead forced to queue the data. This depends very strongly on scheduling. Can you maybe try this: Step I: (1) on terminal A: mkfifo test.fifo (2) on terminal B: sleep 1000 < test.fifo (3) on terminal A: qemu [usual command line options, including -monitor stdio] \ | tee test.fifo (monitor) cont (monitor) info tlb [repeatedly] The idea is, "tee" will write the monitor output to both the terminal and to the fifo. Now the fifo is open for reading by "sleep", but "sleep" won't actually read data. Hence, once the FIFO is full (4KB), "tee" should block. After further 4KB of data (in total, 8KB) the pipe between "qemu" and "tee" should be full as well ("tee" being blocked), and qemu / monitor_flush() should start running into the situation described above. Step II: Unfortunately Step I. in itself is still not enough to reproduce the bug. The above suffices to create some extra watches for stdout-readiness notification, but we need not just "some", but so many of them, that g_poll() fails in the main loop with -1/EINVAL. You can confirm that by witnessing the same hang as reported for RHEL-7 (actually, it's not a hang, the IO thread is spinning without progress). If you manage to do Step I only (ie. the monitor output on the terminal stops, but the guest actually remains responsive via VNC or ssh), then please force qemu-kvm to dump core (*), and hopefully I'll be able to verify the problem by looking at it. (*) Make sure you have core dumps enabled with ulimit, and send qemu-kvm a SIGABRT with "kill" -- in theory Ctrl-\ (= Ctrl-BackSlash), ie. an interactive SIGQUIT should work too, but maybe qemu catches it, I'm not sure. I found a way to reproduce this bug in RHEL-6. (1) In terminal A, issue the following commands: mkfifo fifo.in fifo.out /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \ -mon chardev=fifo,default (2) In terminal B, issue the following command (same directory): cat fifo.out (3) In terminal C, issue the following command (same directory): cat >fifo.in (4) Still in terminal C, type the following command, and verify that its output appears in terminal B: info registers (5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out"). (6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard): info registers (7) At one point, the qemu-kvm process in terminal A dies, with the following message: ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds)) Aborted (In reply to Laszlo Ersek from comment #4) > (6) In terminal C, repeat the following command indefinitely (it's simples > to keep pasting it from the clipboard): > > info registers > > (7) At one point, the qemu-kvm process in terminal A dies, with the > following message: > > ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: > assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds)) > Aborted In my testing, 128 "info registers" commands issued in step (6) are sufficient to trigger the bug. Reproduced this bug on qemu-kvm-0.12.1.2-2.382.el6 and verified pass on qemu-kvm-0.12.1.2-2.385.el6. Steps: (1) In terminal A, issue the following commands: mkfifo fifo.in fifo.out /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \ -mon chardev=fifo,default (2) In terminal B, issue the following command (same directory): cat fifo.out (3) In terminal C, issue the following command (same directory): cat >fifo.in (4) Still in terminal C, type the following command, and verify that its output appears in terminal B: info registers (5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out"). (6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard): info registers ====================== Result: On old qemu-kvm-0.12.1.2-2.382.el6, qemu process in terminal A died at the 90th "info registers" attempt and prompt: [root@t2 home]# /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo -mon chardev=fifo,default VNC server running on `::1:5900' ** ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3942:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds)) Aborted (core dumped) [root@t2 home]# On fixed qemu-kvm-0.12.1.2-2.385.el6, qemu process does not die after 300 times "info registers" input. So, this issue is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1553.html |