Bug 970047 - query mem info from monitor would cause qemu-kvm hang [RHEL-7]
query mem info from monitor would cause qemu-kvm hang [RHEL-7]
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Laszlo Ersek
Virtualization Bugs
Depends On:
  Show dependency treegraph
Reported: 2013-06-03 07:31 EDT by Xiaoqing Wei
Modified: 2014-06-17 23:29 EDT (History)
7 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.2-3.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 985334 (view as bug list)
Last Closed: 2014-06-13 09:29:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
gdb -p `pidof qemu-kvm` thread apply all bt full (31.58 KB, text/plain)
2013-06-03 07:31 EDT, Xiaoqing Wei
no flags Details
proposed patch (2.30 KB, patch)
2013-07-16 11:16 EDT, Laszlo Ersek
no flags Details | Diff
proposed patch (additional) (2.23 KB, patch)
2013-07-16 12:50 EDT, Laszlo Ersek
no flags Details | Diff

  None (edit)
Description Xiaoqing Wei 2013-06-03 07:31:51 EDT
Created attachment 756263 [details]
gdb -p `pidof qemu-kvm`    thread apply all bt full

Description of problem:

query mem info from monitor would cause qemu-kvm hang
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
qemu-kvm -monitor stdio    -S \
    -name 'vm1' \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20130530-191349-ZMSDl1gm,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130530-191349-ZMSDl1gm,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20130530-191349-ZMSDl1gm,path=/tmp/seabios-20130530-191349-ZMSDl1gm,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20130530-191349-ZMSDl1gm,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \
    -drive file='/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64.qcow2',if=none,id=drive-ide0-0-0,media=disk,cache=none,snapshot=off,format=qcow2,aio=native \
    -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \
    -device rtl8139,netdev=idIg594w,mac=9a:3b:3c:3d:3e:3f,bus=pci.0,addr=0x3,id='idCSLDpm' \
    -netdev tap,id=idIg594w \
    -m 4096 \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'host' \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -no-kvm-pit-reinjection \


3. repeatly do
info mem
info tlb

Actual results:
qemu-kvm hangs

Expected results:
qemu-kvm work well

Additional info:
processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping        : 7
microcode       : 0x70d
cpu MHz         : 1200.000
cache size      : 15360 KB
physical id     : 1
siblings        : 12
core id         : 5
cpu cores       : 6
apicid          : 43
initial apicid  : 43
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4004.81
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Comment 2 Laszlo Ersek 2013-06-13 08:39:32 EDT
I don't understand step 2 in comment 0:


Comment 3 Xiaoqing Wei 2013-06-13 23:38:34 EDT
(In reply to Laszlo Ersek from comment #2)
> I don't understand step 2 in comment 0:
> qemu#c
> Thanks.

Hi Laszlo,

the vm was launched w/ a '-S' option, that means paused state,
and a 'c' would push it into running status.
if launch the vm w/o a -S, then the step 2 is not necessary.

Comment 4 Laszlo Ersek 2013-07-02 14:16:47 EDT

another question: what tool generated the qemu-kvm command line? For example, where does "-no-kvm-pit-reinjection" come from, and why is it necessary?

I'll try to reserve a similar machine in Beaker (given '-cpu host').

Comment 5 Xiaoqing Wei 2013-07-03 02:52:44 EDT
(In reply to Laszlo Ersek from comment #4)
> Hi,
> another question: what tool generated the qemu-kvm command line? For
> example, where does "-no-kvm-pit-reinjection" come from, and why is it
> necessary?
> I'll try to reserve a similar machine in Beaker (given '-cpu host').
> Thanks.

Hi Laszlo,

the cmd is generated from autotest, to prevent some guest timedrift.

Comment 10 Laszlo Ersek 2013-07-16 09:56:10 EDT
The IO thread is spinning as follows:


At (1), just before entering os_host_main_loop_wait(), this is the contents
of the gpollfds array (length 3):

  (gdb) print *(GPollFD (*)[3])gpollfds->data
  $24 = {{fd = 21, events = 25, revents = 0},
         {fd = 11, events = 25, revents = 0},
         {fd = 3, events = 25, revents = 0}}

Inside glib_pollfds_fill(), the value of "glib_n_poll_fds" is huge:

  (gdb) print glib_n_poll_fds
  $26 = 2897

The new size of the array is 3 + 2897 == 2900.

After glib_pollfds_fill() returns, ie. at (2), just before
os_host_main_loop_wait() calls g_poll(), the contents of the array is
(length 2900):

  (gdb) print *(GPollFD (*)[2900])gpollfds->data
  $34 = {{fd = 21, events = 25, revents = 0},
         {fd = 11, events = 25, revents = 0},
         {fd = 3, events = 25, revents = 0},
         {fd = 5, events = 1, revents = 0},
         {fd = 6, events = 1, revents = 0},
         {fd = 7, events = 1, revents = 0},
         {fd = 8, events = 1, revents = 0},
         {fd = 16, events = 1, revents = 0},
         {fd = 14, events = 1, revents = 0},
         {fd = 13, events = 1, revents = 0},
         {fd = 4, events = 1, revents = 0},
         {fd = 1, events = 4, revents = 0} <repeats 2888 times>,
         {fd = 0, events = 1, revents = 0}}

The standard output is placed 2888 times in the array. events = 4 is
most likely.)

When we invoke g_poll() (which is only a thin wrapper around poll(); GPollFD
being "struct pollfd" in fact), it returns -1, and sets errno to EINVAL
(errno 22).

The poll(2) manual says,

  EINVAL The nfds value exceeds the RLIMIT_NOFILE value.

and do_sys_poll() in "fs/select.c" seems to confirm that that's the only
EINVAL case.

This instance of qemu-kvm was started with "-monitor stdio". Apparently,
each one of the huge number of lines printed to the stdout monitor by "info
tlb" leaks a GPollFD.

Printing to the monitor is implemented by:

                  io_channel_send() (*)
                  fd_chr_add_watch() (*)


  (gdb) print default_mon->chr->chr_write
  $44 = (int (*)(struct CharDriverState *, const uint8_t *, int))
        0x7f6010ab29f0 <fd_chr_write>

  (gdb) print default_mon->chr->chr_add_watch
  $45 = (GSource *(*)(struct CharDriverState *, GIOCondition))
        0x7f6010ab2870 <fd_chr_add_watch>

monitor_flush() intends to reinstall the watch (by calling
qemu_chr_fe_add_watch()) only if the write in qemu_chr_fe_write() is
partial, but I think that logic doesn't work for some reason.

Something's wrong between monitor_flush() and io_channel_send(). The former
doesn't handle negative return values (errors), it reinstalls the watch
all the same.

The latter (even in upstream!) fails to increment "offset" with
"bytes_written" in some cases, possibly reporting a partial write to
monitor_flush() rather than a full write. This would be consistent with the

The GLib documentation clearly states that "bytes_written" can be set to
nonzero independently of g_io_channel_write_chars()'s retval.

I'll write a patch for QE to test (apparently this test case can easily
reproduce the problem).
Comment 11 Laszlo Ersek 2013-07-16 11:16:32 EDT
Created attachment 774372 [details]
proposed patch
Comment 13 Laszlo Ersek 2013-07-16 11:59:22 EDT
Hmm, no. Even though I think io_channel_send() could be improved, that function doesn't fully explain how so many watches / GPollFDs accrue. Extraneously reinstalling a watch in monitor_flush() would not *increase* the number of watches, just keep it unchanged in the longer term, because monitor_unblocked(), the callback for the watch, returns FALSE, which removes the watch automatically.

Perhaps the other call chain to monitor_flush() is the culprit. tlb_info() (indirectly) invokes monitor_flush() very many times in quick succession. (Each print_pte() call ends with a newline character.)

mon->outbuf may keep growing and growing, hence the chance to a complete write decreases. After a point probably all writes are incomplete, and from that point onward each print_pte() -- ie. monitor_flush() -- call will add a brand new watch.

On the non-callback-initiated call path, no watch should be added if there's already one watch in place.
Comment 14 Laszlo Ersek 2013-07-16 12:12:36 EDT
Indeed. See usbredir_write() and usbredir_write_unblocked(): they track the one watch in dev->watch. (Upstream commit c874ea97.)

Exact same pattern in flush_buf() [hw/char/virtio-console.c] and chr_write_unblocked(), same file: they track vcon->watch. (Upstream commit c3d6b96e, again by Hans.)
Comment 15 Laszlo Ersek 2013-07-16 12:50:50 EDT
Created attachment 774410 [details]
proposed patch (additional)
Comment 18 Laszlo Ersek 2013-07-16 14:18:31 EDT
Posted upstream series: http://thread.gmane.org/gmane.comp.emulators.qemu/222525
Comment 21 Laszlo Ersek 2013-07-19 08:46:43 EDT
upstream commits:

     1  ac8c26f char: io_channel_send: don't lose written bytes
     2  293d2a0 monitor: maintain at most one G_IO_OUT watch
Comment 23 Miroslav Rezanina 2013-08-09 07:44:21 EDT
Fix included in qemu-kvm-1.5.2-3.el7
Comment 24 Qunfang Zhang 2013-12-30 22:53:23 EST
Verified the bug on qemu-kvm-1.5.3-31.el7.x86_64.


(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

Qemu process does not die after 300 times "info registers" input.

So, this issue is fixed.
Comment 26 Ludek Smid 2014-06-13 09:29:23 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.