970047 – query mem info from monitor would cause qemu-kvm hang [RHEL-7]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 970047 - query mem info from monitor would cause qemu-kvm hang [RHEL-7]

Summary: query mem info from monitor would cause qemu-kvm hang [RHEL-7]

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Laszlo Ersek
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-03 11:31 UTC by Xiaoqing Wei
Modified:	2014-06-18 03:29 UTC (History)
CC List:	7 users (show)
Fixed In Version:	qemu-kvm-1.5.2-3.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	985334 (view as bug list)
Environment:
Last Closed:	2014-06-13 13:29:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
gdb -p `pidof qemu-kvm` thread apply all bt full (31.58 KB, text/plain) 2013-06-03 11:31 UTC, Xiaoqing Wei	no flags	Details
proposed patch (2.30 KB, patch) 2013-07-16 15:16 UTC, Laszlo Ersek	no flags	Details \| Diff
proposed patch (additional) (2.23 KB, patch) 2013-07-16 16:50 UTC, Laszlo Ersek	no flags	Details \| Diff
View All

Description Xiaoqing Wei 2013-06-03 11:31:51 UTC

Created attachment 756263 [details]
gdb -p `pidof qemu-kvm`    thread apply all bt full

Description of problem:

query mem info from monitor would cause qemu-kvm hang
Version-Release number of selected component (if applicable):
qemu-kvm-1.5.0-2.el7.x86_64
kernel-3.9.0-0.55.el7.x86_64


How reproducible:
5/5

Steps to Reproduce:
1.
qemu-kvm -monitor stdio    -S \
    -name 'vm1' \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20130530-191349-ZMSDl1gm,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130530-191349-ZMSDl1gm,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20130530-191349-ZMSDl1gm,path=/tmp/seabios-20130530-191349-ZMSDl1gm,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20130530-191349-ZMSDl1gm,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \
    -drive file='/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64.qcow2',if=none,id=drive-ide0-0-0,media=disk,cache=none,snapshot=off,format=qcow2,aio=native \
    -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \
    -device rtl8139,netdev=idIg594w,mac=9a:3b:3c:3d:3e:3f,bus=pci.0,addr=0x3,id='idCSLDpm' \
    -netdev tap,id=idIg594w \
    -m 4096 \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'host' \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -no-kvm-pit-reinjection \
    -enable-kvm

2.qemu#c

3. repeatly do
info mem
info tlb

Actual results:
qemu-kvm hangs

Expected results:
qemu-kvm work well

Additional info:
processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping        : 7
microcode       : 0x70d
cpu MHz         : 1200.000
cache size      : 15360 KB
physical id     : 1
siblings        : 12
core id         : 5
cpu cores       : 6
apicid          : 43
initial apicid  : 43
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4004.81
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Comment 2 Laszlo Ersek 2013-06-13 12:39:32 UTC

I don't understand step 2 in comment 0:

qemu#c

Thanks.

Comment 3 Xiaoqing Wei 2013-06-14 03:38:34 UTC

(In reply to Laszlo Ersek from comment #2)
> I don't understand step 2 in comment 0:
> 
> qemu#c
> 
> Thanks.

Hi Laszlo,

the vm was launched w/ a '-S' option, that means paused state,
and a 'c' would push it into running status.
if launch the vm w/o a -S, then the step 2 is not necessary.


Regards,
Xiaoqing.

Comment 4 Laszlo Ersek 2013-07-02 18:16:47 UTC

Hi,

another question: what tool generated the qemu-kvm command line? For example, where does "-no-kvm-pit-reinjection" come from, and why is it necessary?

I'll try to reserve a similar machine in Beaker (given '-cpu host').

Thanks.

Comment 5 Xiaoqing Wei 2013-07-03 06:52:44 UTC

(In reply to Laszlo Ersek from comment #4)
> Hi,
> 
> another question: what tool generated the qemu-kvm command line? For
> example, where does "-no-kvm-pit-reinjection" come from, and why is it
> necessary?
> 
> I'll try to reserve a similar machine in Beaker (given '-cpu host').
> 
> Thanks.

Hi Laszlo,

the cmd is generated from autotest, to prevent some guest timedrift.

Regards.

Comment 10 Laszlo Ersek 2013-07-16 13:56:10 UTC

The IO thread is spinning as follows:

    main_loop_wait()
      slirp_pollfds_fill()
      qemu_iohandler_fill()
(1)
      os_host_main_loop_wait()
        glib_pollfds_fill()
          g_main_context_query()
(2)
        g_poll()
        glib_pollfds_poll()
      qemu_iohandler_poll()
      slirp_pollfds_poll()

At (1), just before entering os_host_main_loop_wait(), this is the contents
of the gpollfds array (length 3):

  (gdb) print *(GPollFD (*)[3])gpollfds->data
  $24 = {{fd = 21, events = 25, revents = 0},
         {fd = 11, events = 25, revents = 0},
         {fd = 3, events = 25, revents = 0}}

Inside glib_pollfds_fill(), the value of "glib_n_poll_fds" is huge:

  (gdb) print glib_n_poll_fds
  $26 = 2897

The new size of the array is 3 + 2897 == 2900.

After glib_pollfds_fill() returns, ie. at (2), just before
os_host_main_loop_wait() calls g_poll(), the contents of the array is
(length 2900):

  (gdb) print *(GPollFD (*)[2900])gpollfds->data
  $34 = {{fd = 21, events = 25, revents = 0},
         {fd = 11, events = 25, revents = 0},
         {fd = 3, events = 25, revents = 0},
         {fd = 5, events = 1, revents = 0},
         {fd = 6, events = 1, revents = 0},
         {fd = 7, events = 1, revents = 0},
         {fd = 8, events = 1, revents = 0},
         {fd = 16, events = 1, revents = 0},
         {fd = 14, events = 1, revents = 0},
         {fd = 13, events = 1, revents = 0},
         {fd = 4, events = 1, revents = 0},
         {fd = 1, events = 4, revents = 0} <repeats 2888 times>,
         {fd = 0, events = 1, revents = 0}}

The standard output is placed 2888 times in the array. events = 4 is
POLLOUT. (G_IO_OUT / GLIB_SYSDEF_POLLOUT are identity mapped to POLLOUT,
most likely.)

When we invoke g_poll() (which is only a thin wrapper around poll(); GPollFD
being "struct pollfd" in fact), it returns -1, and sets errno to EINVAL
(errno 22).

The poll(2) manual says,

  EINVAL The nfds value exceeds the RLIMIT_NOFILE value.

and do_sys_poll() in "fs/select.c" seems to confirm that that's the only
EINVAL case.


This instance of qemu-kvm was started with "-monitor stdio". Apparently,
each one of the huge number of lines printed to the stdout monitor by "info
tlb" leaks a GPollFD.

Printing to the monitor is implemented by:

  tlb_info()
    ...
      print_pte()
        monitor_printf()
          monitor_vprintf()
            monitor_puts()
              monitor_flush()
                qemu_chr_fe_write()
                  io_channel_send() (*)
                    g_io_channel_write_chars()
                qemu_chr_fe_add_watch()
                  fd_chr_add_watch() (*)
                    g_io_create_watch()
                  g_source_set_callback()
                  g_source_attach()

(*)

  (gdb) print default_mon->chr->chr_write
  $44 = (int (*)(struct CharDriverState *, const uint8_t *, int))
        0x7f6010ab29f0 <fd_chr_write>

  (gdb) print default_mon->chr->chr_add_watch
  $45 = (GSource *(*)(struct CharDriverState *, GIOCondition))
        0x7f6010ab2870 <fd_chr_add_watch>

monitor_flush() intends to reinstall the watch (by calling
qemu_chr_fe_add_watch()) only if the write in qemu_chr_fe_write() is
partial, but I think that logic doesn't work for some reason.

Something's wrong between monitor_flush() and io_channel_send(). The former
doesn't handle negative return values (errors), it reinstalls the watch
all the same.

The latter (even in upstream!) fails to increment "offset" with
"bytes_written" in some cases, possibly reporting a partial write to
monitor_flush() rather than a full write. This would be consistent with the
symptoms.

The GLib documentation clearly states that "bytes_written" can be set to
nonzero independently of g_io_channel_write_chars()'s retval.


I'll write a patch for QE to test (apparently this test case can easily
reproduce the problem).

Comment 11 Laszlo Ersek 2013-07-16 15:16:32 UTC

Created attachment 774372 [details]
proposed patch

Comment 13 Laszlo Ersek 2013-07-16 15:59:22 UTC

Hmm, no. Even though I think io_channel_send() could be improved, that function doesn't fully explain how so many watches / GPollFDs accrue. Extraneously reinstalling a watch in monitor_flush() would not *increase* the number of watches, just keep it unchanged in the longer term, because monitor_unblocked(), the callback for the watch, returns FALSE, which removes the watch automatically.

Perhaps the other call chain to monitor_flush() is the culprit. tlb_info() (indirectly) invokes monitor_flush() very many times in quick succession. (Each print_pte() call ends with a newline character.)

mon->outbuf may keep growing and growing, hence the chance to a complete write decreases. After a point probably all writes are incomplete, and from that point onward each print_pte() -- ie. monitor_flush() -- call will add a brand new watch.

On the non-callback-initiated call path, no watch should be added if there's already one watch in place.

Comment 14 Laszlo Ersek 2013-07-16 16:12:36 UTC

Indeed. See usbredir_write() and usbredir_write_unblocked(): they track the one watch in dev->watch. (Upstream commit c874ea97.)

Exact same pattern in flush_buf() [hw/char/virtio-console.c] and chr_write_unblocked(), same file: they track vcon->watch. (Upstream commit c3d6b96e, again by Hans.)

Comment 15 Laszlo Ersek 2013-07-16 16:50:50 UTC

Created attachment 774410 [details]
proposed patch (additional)

Comment 18 Laszlo Ersek 2013-07-16 18:18:31 UTC

Posted upstream series: http://thread.gmane.org/gmane.comp.emulators.qemu/222525

Comment 21 Laszlo Ersek 2013-07-19 12:46:43 UTC

upstream commits:

     1  ac8c26f char: io_channel_send: don't lose written bytes
     2  293d2a0 monitor: maintain at most one G_IO_OUT watch

Comment 23 Miroslav Rezanina 2013-08-09 11:44:21 UTC

Fix included in qemu-kvm-1.5.2-3.el7

Comment 24 Qunfang Zhang 2013-12-31 03:53:23 UTC

Verified the bug on qemu-kvm-1.5.3-31.el7.x86_64.

Steps:

(1) In terminal A, issue the following commands:

    mkfifo fifo.in fifo.out
    /usr/libexec/qemu-kvm -chardev pipe,id=fifo,path=fifo \
        -mon chardev=fifo,default

(2) In terminal B, issue the following command (same directory):

  cat fifo.out

(3) In terminal C, issue the following command (same directory):

  cat >fifo.in

(4) Still in terminal C, type the following command, and verify that its output appears in terminal B:

  info registers

(5) In terminal B, press ^Z (ie. stop (but do not kill) the "cat" process reading from "fifo.out").

(6) In terminal C, repeat the following command indefinitely (it's simples to keep pasting it from the clipboard):

  info registers

Result:
Qemu process does not die after 300 times "info registers" input.

So, this issue is fixed.

Comment 26 Ludek Smid 2014-06-13 13:29:23 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.