808295 – qemu-kvm segfaults under heavy QMP I/O

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 808295 - qemu-kvm segfaults under heavy QMP I/O

Summary: qemu-kvm segfaults under heavy QMP I/O

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Amit Shah
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-30 05:09 UTC by juzhang
Modified:	2013-09-10 08:20 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-10 08:20:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Simple QMP test script (654 bytes, application/x-shellscript) 2012-03-30 13:32 UTC, Jeff Cody	no flags	Details
detailed tracing patch and log (4.70 KB, application/x-gzip) 2012-04-04 21:14 UTC, Luiz Capitulino	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	909059	0	medium	CLOSED	Switch to upstream solution for chardev flow control	2021-02-22 00:41:40 UTC

Internal Links: 909059

Description juzhang 2012-03-30 05:09:14 UTC

Description of problem:
Boot guest with two disk. then run script about create snapshot repeatly. before scripte finish,kill script process by ctrl+c.

Version-Release number of selected component (if applicable):
qemu-267.rle6ev

How reproducible:
100%

Steps to Reproduce:
1.Boot guest
/usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 2G -smp 4,maxcpus=8 -cpu SandyBridge,-kvmclock -name rhel6.3 -rtc base=localtime,clock=host,driftfix=slew -no-shutdown -drive file=/root/rhel6.3-64-virtio.qcow2,if=none,id=virtio0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=virtio0,id=virtio0-device,bootindex=0 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=
net0,mac=dd:54:00:6a:c7:d8,bus=pci.0,addr=0x3,bootindex=1 -usb -vnc :10 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -boot menu=on -monitor stdio -qmp tcp:localhost:4444,server -drive file=/root/zhang1.qcow2,if=none,id=virtio1,format=qcow2,cache=none -device virtio-blk-pci,drive=virtio1,id=virtio1-device
2.create /dev/tcp/localhost/4444 in oder to adopt the script
cat  /dev/tcp/localhost/4444
telnet localhost 4444

3. run the script
4. kill script process by ctrl+c
  
Actual results:
qemu-kvm core dump
(gdb) bt
 #0  0x00007ffff775148d in write () from /lib64/libpthread.so.0
 #1  0x00007ffff7e66949 in do_send (chr=0x7ffff86d8c60, fd=26, _buf=<value
optimized out>, len1=151) at qemu-char.c:572
 #2  send_all (chr=0x7ffff86d8c60, fd=26, _buf=<value optimized out>, len1=151)
at qemu-char.c:627
 #3  0x00007ffff7e66ab4 in tcp_chr_write (chr=0x7ffff86d8c60, buf=<value
optimized out>, len=151) at qemu-char.c:2051
 #4  0x00007ffff7df254c in monitor_flush (mon=0x7ffff8a51480) at
/usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:255
 #5  0x00007ffff7df25aa in monitor_puts (mon=0x7ffff8a51480, str=0x7fff0bc1e2e6
"") at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:274
#6  0x00007ffff7df2619 in monitor_json_emitter (mon=0x7ffff8a51480, data=<value
optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:369
#7  0x00007ffff7df278f in monitor_protocol_emitter (mon=0x7ffff8a51480,
data=0x0) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:403
 #8  0x00007ffff7df2958 in monitor_call_handler (mon=0x7ffff8a51480,
cmd=0x7ffff82be5a8, params=<value optimized out>)
     at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4191
#9  0x00007ffff7df3544 in handle_qmp_command (parser=<value optimized out>,
tokens=<value optimized out>) at
/usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4789
#10 0x00007ffff7e43234 in json_message_process_token (lexer=0x7ffff8742390,
token=0x7ffeed3332a0, type=JSON_OPERATOR, x=107, y=9726) at json-streamer.c:87
#11 0x00007ffff7e42ed0 in json_lexer_feed_char (lexer=0x7ffff8742390, ch=125
'}', flush=false) at json-lexer.c:303
 #12 0x00007ffff7e43019 in json_lexer_feed (lexer=0x7ffff8742390,
buffer=0x7fffffffbb20 "}", size=1) at json-lexer.c:355
#13 0x00007ffff7df237e in monitor_control_read (opaque=<value optimized out>,
buf=<value optimized out>, size=<value optimized out>)
 at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4810
#14 0x00007ffff7e655aa in qemu_chr_read (opaque=0x7ffff86d8c60) at
qemu-char.c:180
#15 tcp_chr_read (opaque=0x7ffff86d8c60) at qemu-char.c:2217
#16 0x000

Expected results:
guest works well

Additional info:
The script
#!/bin/bash
# some simply group snapshot stress testing

let i=0
exec 3<>/dev/tcp/localhost/4444
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 10000 ]
do
    echo -e "{ 'execute': 'transaction', 'arguments':
  {'actions': [
    { 'type': 'blockdev-snapshot-sync', 'data' :
      { 'device': 'virtio0', 'snapshot-file': '/tmp/test/f16-snap$i.qcow2', 'format': 'qcow2' } },
    { 'type': 'blockdev-snapshot-sync', 'data' :
      { 'device': 'virtio1', 'snapshot-file': '/tmp/test/space-snap$i.qcow2', 'format': 'qcow2' } } ] } }" >&3
    read response <&3
    echo "$i: $response"
    let i=$i+1
done

Comment 1 juzhang 2012-03-30 05:11:47 UTC

Tried again,find another bt
<juzhang> (gdb) bt
<juzhang> #0  0x00007ffff775148d in write () from /lib64/libpthread.so.0
<juzhang> #1  0x00007ffff7e66949 in do_send (chr=0x7ffff86d8be0, fd=8, _buf=<value optimized out>, len1=16) at qemu-char.c:572
<juzhang> #2  send_all (chr=0x7ffff86d8be0, fd=8, _buf=<value optimized out>, len1=16) at qemu-char.c:627
<juzhang> #3  0x00007ffff7e66ab4 in tcp_chr_write (chr=0x7ffff86d8be0, buf=<value optimized out>, len=16) at qemu-char.c:2051
<juzhang> #4  0x00007ffff7df254c in monitor_flush (mon=0x7ffff8a51480) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:255
#5  0x00007ffff7df25aa in monitor_puts (mon=0x7ffff8a51480, str=0x7ffff8cbdecf "") at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:274
#6  0x00007ffff7df2619 in monitor_json_emitter (mon=0x7ffff8a51480, data=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:369
 #7  0x00007ffff7df278f in monitor_protocol_emitter (mon=0x7ffff8a51480, data=0x0) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:403
 #8  0x00007ffff7df2958 in monitor_call_handler (mon=0x7ffff8a51480, cmd=0x7ffff82be5a8, params=<value optimized out>)
 at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4191
#9  0x00007ffff7df3544 in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4789
#10 0x00007ffff7e43234 in json_message_process_token (lexer=0x7ffff8742390, token=0x7fffe4ebee90, type=JSON_OPERATOR, x=105, y=348) at json-streamer.c:87
#11 0x00007ffff7e42ed0 in json_lexer_feed_char (lexer=0x7ffff8742390, ch=125 '}', flush=false) at json-lexer.c:303
#12 0x00007ffff7e43019 in json_lexer_feed (lexer=0x7ffff8742390, buffer=0x7fffffffbb30 "}\212m\370\377\177", size=1) at json-lexer.c:355
#13 0x00007ffff7df237e in monitor_control_read (opaque=<value optimized out>, buf=<value optimized out>, size=<value optimized out>)
at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4810
 #14 0x00007ffff7e655aa in qemu_chr_read (opaque=0x7ffff86d8be0) at qemu-char.c:180
 #15 tcp_chr_read (opaque=0x7ffff86d8be0) at qemu-char.c:2217
 #16 0x00007ffff7deb19f in main_loop_wait (timeout=1000) a

Comment 2 Luiz Capitulino 2012-03-30 13:21:07 UTC

Looks like a chardev bug, will investigate soon.

Comment 3 Jeff Cody 2012-03-30 13:32:05 UTC

Created attachment 573982 [details]
Simple QMP  test script

Luiz,

I've attached a script that I have been able to reproduce this bug 100% with the downstream 6.3.  Upstream does not seem to have the issue.  This new script does nothing with live snapshots, but issues a 'query-block' command as quickly as possible, and that sub-process is killed at random intervals to sever the QMP connection.  This script will run until you hit ^C.  

Here is the QEMU commandline that I used:
qemu-system-x86_64 -enable-kvm -boot c -drive file=/home/Jeff.Cody/virt/images/f16-1.qcow2,if=virtio,boot=on -m 768 -qmp tcp:localhost:4444,server

Comment 4 Luiz Capitulino 2012-03-30 20:50:22 UTC

Managed to reproduce this with help from Jeff. The call to query-block is not needed to trigger the bug.

Will work on this on Monday.

Comment 5 Luiz Capitulino 2012-04-02 20:34:39 UTC

Debugged this today and found the cause. The bug was introduced by commit ba75551d24d1b0577118854ee166f8fee84c0969. Indeed, if I comment out the tcp_closed() call added by this commit the segfault is gone

However, I'm not sure the best way to fix this. Just sent an email to the involved folks. Will follow up here.

Comment 6 Luiz Capitulino 2012-04-04 01:15:07 UTC

Let me explain what I think is happening here.

First, the problem manifests itself differently for me from what has been
originally reported. Either, I get:

*** glibc detected *** ./qemu-rhel6: corrupted double-linked list: 0x00000000025d1510 ***
*** glibc detected *** ./qemu-rhel6: corrupted double-linked list: 0x00000000025d1510 ***

Or a segfault, with the following backtrace:

#0  0x00007fa2638ea692 in _int_malloc () from /lib64/libc.so.6
#1  0x00007fa2638ebaf1 in malloc () from /lib64/libc.so.6
#2  0x000000000043ac85 in qemu_malloc (size=<value optimized out>) at /home/lcapitulino/src/qemu-kvm-rhel6/qemu-malloc.c:57
#3  0x000000000045fb01 in qstring_from_substr (str=0x5c7fef "", start=0, end=<value optimized out>)
    at /home/lcapitulino/src/qemu-kvm-rhel6/qstring.c:42
#4  0x0000000000461285 in json_lexer_feed_char (lexer=0x1b0dc30, ch=125 '}', flush=false)
    at /home/lcapitulino/src/qemu-kvm-rhel6/json-lexer.c:306
#5  0x0000000000461399 in json_lexer_feed (lexer=0x1b0dc30, buffer=0x7fff26a72d70 "}", size=1)
    at /home/lcapitulino/src/qemu-kvm-rhel6/json-lexer.c:355
#6  0x0000000000412cf2 in monitor_control_read (opaque=<value optimized out>, buf=<value optimized out>, size=<value optimized out>)
    at /home/lcapitulino/src/qemu-kvm-rhel6/monitor.c:4810
#7  0x0000000000481d5a in qemu_chr_read (opaque=0x1363790) at /home/lcapitulino/src/qemu-kvm-rhel6/qemu-char.c:180
#8  tcp_chr_read (opaque=0x1363790) at /home/lcapitulino/src/qemu-kvm-rhel6/qemu-char.c:2217
#9  0x000000000040c45f in main_loop_wait (timeout=1000) at /home/lcapitulino/src/qemu-kvm-rhel6/vl.c:3990
#10 0x000000000042c1aa in kvm_main_loop () at /home/lcapitulino/src/qemu-kvm-rhel6/qemu-kvm.c:2244
#11 0x000000000040f3a3 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /home/lcapitulino/src/qemu-kvm-rhel6/vl.c:4202
#12 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /home/lcapitulino/src/qemu-kvm-rhel6/vl.c:6427

This is clearly a memory corruption bug to me.

One important information is that I can't reproduce this upstream and if I revert ba75551d the bug is gone too.

Now, what does ba75551d do? It changes the chardev tcp backend to automatically close the listening socket on error _and_ sends the CHR_EVENT_CLOSED event.

The CHR_EVENT_CLOSED will cause the QMP server to free some of its resources monitor_control_event()), however writing seems to continue, which could make the QMP server to write to already-freed memory locations.

So, my *impression* is that the chardev layer keeps feeding the QMP server, even after it has closed the listening socket itself; or the chardev layer calls chardev methods after having emitted the CHR_EVENT_CLOSED event.

Amit, I'll re-assign this to you because this behavior was introduced by commit ba75551d and I don't have much info about the bug it's supposed to fix.

Comment 7 Luiz Capitulino 2012-04-04 01:17:56 UTC

By the way, the following patch seems to workaround the issue:

diff --git a/json-lexer.c b/json-lexer.c
index c21338f..8235516 100644
--- a/json-lexer.c
+++ b/json-lexer.c
@@ -369,4 +369,5 @@ int json_lexer_flush(JSONLexer *lexer)
 void json_lexer_destroy(JSONLexer *lexer)
 {
     QDECREF(lexer->token);
+    lexer->token = NULL;
 }
diff --git a/json-streamer.c b/json-streamer.c
index c255c78..5ee2ffe 100644
--- a/json-streamer.c
+++ b/json-streamer.c
@@ -119,4 +119,5 @@ void json_message_parser_destroy(JSONMessageParser *parser)
 {
     json_lexer_destroy(&parser->lexer);
     QDECREF(parser->tokens);
+    parser->tokens = NULL;
 }

This is probably making it impossible for QMP to write to already-freed memory. We could apply it, but not fixing the real bug could make it come back later...

Comment 8 Luiz Capitulino 2012-04-04 21:12:56 UTC

Amit asked me to add some tracing to the code to know which chardev callbacks are called after the backend is closed (see comment 6).

It turns out that qemu_chr_read() and qemu_chr_write() keep going normally after the EPIPE error, until it gets a *second* EPIPE (which is when qemu really explodes).

Here's the tracing. After thousands of qemu_chr_read()/qemu_chr_write() calls, I see:

-> calling chr_read
-> calling chr_write
-> got EPIPE, calling tcp_closed()  <--- this happens in tcp_chr_write()
-> monitor freed its resources      <--- in monitor_control_event()
<- tcp_closed() exited
<- exiting chr_write
<- exting chr_read

The read/write process just keep working after that, with hundreds of calls to qemu_chr_read()/qemu_chr_write()... Then:

-> calling chr_read
-> calling chr_write
-> got EPIPE, calling tcp_closed() <--- being called for the second time
-> monitor freed its resources
<- tcp_closed() exited
<- exiting chr_write
----> qemu explodes

Now, one could argue that QMP should check the return value of qemu_chr_write() and bail out. Not sure this is a valid argument, because:

1. What's QMP supposed to do on a write() error? The chardev layer doesn't support dropping a connection (it only supports destroying the chatdev)

2. Only one or two chardev users actually check for write errors, this means that all the other users might suffer from the same issue

3. Most importantly: the chardev layer is not supposed to keep working normally after having closed the backend

I honestly think that the semantic change introduced by ba75551d is not supported and should be reworked.

Comment 9 Luiz Capitulino 2012-04-04 21:14:09 UTC

Created attachment 575226 [details]
detailed tracing patch and log

Comment 10 Luiz Capitulino 2012-04-16 17:19:52 UTC

I was reading some related code recently and found out that my debugging done
in comment 8 is embarassingly flawed: I was looking for the behavior I wanted to see.

I've fixed my tracing the info I have after a quick run is:

1. I'm very likely wrong about the cause of the bug being that the chardev layer
keeps calling its write and read methods after it got -EPIPE. This doesn't seem
to happen at all

2. The last contact I get from qemu (in my traces) before it explodes happens in json_message_parser_init() being called by monitor_control_event(), but it could be because it's the first function to allocate memory from corrupted malloc()

Comment 11 Markus Armbruster 2012-04-19 15:59:22 UTC

I used qmp-stress.sh from comment#3 with

$ valgrind rhel6-qemu-kvm -enable-kvm -nodefaults -S -qmp tcp:localhost:4444,server,nowait
[Unrelated valgrind gripes (false positives?) snipped...]
VNC server running on `127.0.0.1:5900'
QEMU 0.12.1 monitor - type 'help' for more information
(qemu) ==19922== Thread 1:
==19922== Invalid read of size 8
==19922==    at 0x459726: json_message_process_token (qobject.h:96)
==19922==    by 0x4592A4: json_lexer_feed_char (json-lexer.c:303)
==19922==    by 0x4594AD: json_lexer_feed (json-lexer.c:355)
==19922==    by 0x4597F2: json_message_parser_feed (json-streamer.c:110)
==19922==    by 0x410A13: monitor_control_read (monitor.c:4810)
==19922==    by 0x4789CC: qemu_chr_read (qemu-char.c:180)
==19922==    by 0x478F51: tcp_chr_read (qemu-char.c:2217)
==19922==    by 0x40B6D2: main_loop_wait (vl.c:3993)
==19922==    by 0x42A31C: kvm_main_loop (qemu-kvm.c:2244)
==19922==    by 0x40EE13: main (vl.c:4205)
==19922==  Address 0x556d638 is 8 bytes inside a block of size 32 free'd
==19922==    at 0x4A055FE: free (vg_replace_malloc.c:366)
==19922==    by 0x43507F: qemu_free (qemu-malloc.c:39)
==19922==    by 0x458A27: qlist_destroy_obj (qlist.c:155)
==19922==    by 0x459875: json_message_parser_destroy (qobject.h:99)
==19922==    by 0x4122D9: monitor_control_event (monitor.c:4883)
==19922==    by 0x4777CE: qemu_chr_event (qemu-char.c:137)
==19922==    by 0x477A2D: tcp_closed (qemu-char.c:2041)
==19922==    by 0x479381: tcp_chr_write (qemu-char.c:2053)
==19922==    by 0x478940: qemu_chr_write (qemu-char.c:161)
==19922==    by 0x411131: monitor_flush (monitor.c:255)
==19922==    by 0x4111A2: monitor_puts (monitor.c:274)
==19922==    by 0x4111FD: monitor_json_emitter (monitor.c:369)
==19922== 
==19922== Invalid write of size 8
==19922==    at 0x45972E: json_message_process_token (qobject.h:96)
==19922==    by 0x4592A4: json_lexer_feed_char (json-lexer.c:303)
==19922==    by 0x4594AD: json_lexer_feed (json-lexer.c:355)
==19922==    by 0x4597F2: json_message_parser_feed (json-streamer.c:110)
==19922==    by 0x410A13: monitor_control_read (monitor.c:4810)
==19922==    by 0x4789CC: qemu_chr_read (qemu-char.c:180)
==19922==    by 0x478F51: tcp_chr_read (qemu-char.c:2217)
==19922==    by 0x40B6D2: main_loop_wait (vl.c:3993)
==19922==    by 0x42A31C: kvm_main_loop (qemu-kvm.c:2244)
==19922==    by 0x40EE13: main (vl.c:4205)
==19922==  Address 0x556d638 is 8 bytes inside a block of size 32 free'd
==19922==    at 0x4A055FE: free (vg_replace_malloc.c:366)
==19922==    by 0x43507F: qemu_free (qemu-malloc.c:39)
==19922==    by 0x458A27: qlist_destroy_obj (qlist.c:155)
==19922==    by 0x459875: json_message_parser_destroy (qobject.h:99)
==19922==    by 0x4122D9: monitor_control_event (monitor.c:4883)
==19922==    by 0x4777CE: qemu_chr_event (qemu-char.c:137)
==19922==    by 0x477A2D: tcp_closed (qemu-char.c:2041)
==19922==    by 0x479381: tcp_chr_write (qemu-char.c:2053)
==19922==    by 0x478940: qemu_chr_write (qemu-char.c:161)
==19922==    by 0x411131: monitor_flush (monitor.c:255)
==19922==    by 0x4111A2: monitor_puts (monitor.c:274)
==19922==    by 0x4111FD: monitor_json_emitter (monitor.c:369)
==19922== 
==19922== Invalid read of size 8
==19922==    at 0x4592AE: json_lexer_feed_char (qobject.h:96)
==19922==    by 0x4594AD: json_lexer_feed (json-lexer.c:355)
==19922==    by 0x4597F2: json_message_parser_feed (json-streamer.c:110)
==19922==    by 0x410A13: monitor_control_read (monitor.c:4810)
==19922==    by 0x4789CC: qemu_chr_read (qemu-char.c:180)
==19922==    by 0x478F51: tcp_chr_read (qemu-char.c:2217)
==19922==    by 0x40B6D2: main_loop_wait (vl.c:3993)
==19922==    by 0x42A31C: kvm_main_loop (qemu-kvm.c:2244)
==19922==    by 0x40EE13: main (vl.c:4205)
==19922==  Address 0xf3716c8 is 8 bytes inside a block of size 40 free'd
==19922==    at 0x4A055FE: free (vg_replace_malloc.c:366)
==19922==    by 0x43507F: qemu_free (qemu-malloc.c:39)
==19922==    by 0x457F6C: qstring_destroy_obj (qstring.c:139)
==19922==    by 0x458121: qentry_destroy (qobject.h:99)
==19922==    by 0x4581F7: qdict_destroy_obj (qdict.c:471)
==19922==    by 0x4589FD: qlist_destroy_obj (qobject.h:99)
==19922==    by 0x459875: json_message_parser_destroy (qobject.h:99)
==19922==    by 0x4122D9: monitor_control_event (monitor.c:4883)
==19922==    by 0x4777CE: qemu_chr_event (qemu-char.c:137)
==19922==    by 0x477A2D: tcp_closed (qemu-char.c:2041)
==19922==    by 0x479381: tcp_chr_write (qemu-char.c:2053)
==19922==    by 0x478940: qemu_chr_write (qemu-char.c:161)
==19922== 
==19922== Invalid write of size 8
==19922==    at 0x4592B6: json_lexer_feed_char (qobject.h:96)
==19922==    by 0x4594AD: json_lexer_feed (json-lexer.c:355)
==19922==    by 0x4597F2: json_message_parser_feed (json-streamer.c:110)
==19922==    by 0x410A13: monitor_control_read (monitor.c:4810)
==19922==    by 0x4789CC: qemu_chr_read (qemu-char.c:180)
==19922==    by 0x478F51: tcp_chr_read (qemu-char.c:2217)
==19922==    by 0x40B6D2: main_loop_wait (vl.c:3993)
==19922==    by 0x42A31C: kvm_main_loop (qemu-kvm.c:2244)
==19922==    by 0x40EE13: main (vl.c:4205)
==19922==  Address 0xf3716c8 is 8 bytes inside a block of size 40 free'd
==19922==    at 0x4A055FE: free (vg_replace_malloc.c:366)
==19922==    by 0x43507F: qemu_free (qemu-malloc.c:39)
==19922==    by 0x457F6C: qstring_destroy_obj (qstring.c:139)
==19922==    by 0x458121: qentry_destroy (qobject.h:99)
==19922==    by 0x4581F7: qdict_destroy_obj (qdict.c:471)
==19922==    by 0x4589FD: qlist_destroy_obj (qobject.h:99)
==19922==    by 0x459875: json_message_parser_destroy (qobject.h:99)
==19922==    by 0x4122D9: monitor_control_event (monitor.c:4883)
==19922==    by 0x4777CE: qemu_chr_event (qemu-char.c:137)
==19922==    by 0x477A2D: tcp_closed (qemu-char.c:2041)
==19922==    by 0x479381: tcp_chr_write (qemu-char.c:2053)
==19922==    by 0x478940: qemu_chr_write (qemu-char.c:161)
==19922==

Comment 12 Markus Armbruster 2012-04-19 16:03:30 UTC

I believe our non-upstream commit ba75551d is flawed, we'll have to revert it, and fix its bug 621484 differently.

Comment 13 Luiz Capitulino 2012-04-19 17:41:43 UTC

(In reply to comment #12)
> I believe our non-upstream commit ba75551d is flawed, we'll have to revert it,
> and fix its bug 621484 differently.

Yes, that's what I think too. But I also think that we can't revert the commit before fixing bug 621484 (just in case you implied this).

Comment 16 Ademar Reis 2012-07-02 20:48:43 UTC

It's very unlikely to be triggered in supported scenarios and upstream/RHEL7 is OK, so I'm closing it as DEFERRED.

Comment 17 Amit Shah 2012-07-03 05:54:48 UTC

(In reply to comment #16)
> It's very unlikely to be triggered in supported scenarios and upstream/RHEL7
> is OK, so I'm closing it as DEFERRED.

There was another bug with a similar backtrace; I will try to find it.

Note, however, that RHEL7 will also have the same patches that we have in RHEL6, so this bug will most likely reproduce there as well.

Comment 18 Amit Shah 2012-07-03 06:00:47 UTC

(In reply to comment #17)
> (In reply to comment #16)
> > It's very unlikely to be triggered in supported scenarios and upstream/RHEL7
> > is OK, so I'm closing it as DEFERRED.
> 
> There was another bug with a similar backtrace; I will try to find it.

Bug 822386

Comment 19 Ademar Reis 2012-07-03 13:14:21 UTC

(In reply to comment #17)
> (In reply to comment #16)
> > It's very unlikely to be triggered in supported scenarios and upstream/RHEL7
> > is OK, so I'm closing it as DEFERRED.
> 
> 
> Note, however, that RHEL7 will also have the same patches that we have in
> RHEL6, so this bug will most likely reproduce there as well.

Hmmm, so you don't think the culprit is "non-upstream commit ba75551d" as pointed by Markus and Luiz? Anyway, until that is clear, I'm reopening it.

Comment 20 Amit Shah 2012-07-03 13:33:36 UTC

(In reply to comment #19)
> (In reply to comment #17)
> > (In reply to comment #16)
> > > It's very unlikely to be triggered in supported scenarios and upstream/RHEL7
> > > is OK, so I'm closing it as DEFERRED.
> > 
> > 
> > Note, however, that RHEL7 will also have the same patches that we have in
> > RHEL6, so this bug will most likely reproduce there as well.
> 
> Hmmm, so you don't think the culprit is "non-upstream commit ba75551d" as
> pointed by Markus and Luiz? Anyway, until that is clear, I'm reopening it.

It is non-upstream (in qemu), but the commit is included in Fedora.  This is part of the set of chardev fixes which we carry for spice and other chardev users to work properly, but we've failed to push the fixes upstream for lack of consensus on how to solve the problem properly.  It's a subject that comes up every often upstream, but we aren't sure when exactly we'll have the resolution in qemu.

Comment 21 Luiz Capitulino 2012-07-03 19:11:42 UTC

I think that closing this as WONTFIX for 6.x and cloning it for 7.0 is reasonable. Although it might not get fixed in 7.0 due to lack of consensus as said by Amit.

This seems to be one of those chardev issues that make people want to re-write it...

Comment 25 Markus Armbruster 2012-11-16 12:54:07 UTC

Looks like upstream is finally coming around to address the flow control problem we fixed in RHEL-6 with non-upstream patches (bug 621484).  The plan is to have that upstream work in RHEL-7 instead of forward porting the flawed non-upstream patches from RHEL-6.  Should take care of this bug.

Sneak preview at
https://github.com/aliguori/qemu/tree/char-flow.1

Comment 26 Luiz Capitulino 2012-12-05 19:04:46 UTC

Bug 882078 seems to be related or even a duplicate of this issue. It's triggered through libvirt.

Comment 28 Qunfang Zhang 2013-04-27 07:16:28 UTC

Hi, Amit
I tested this bug on RHEL6 host for both official qemu-kvm-361 and your private biuld v9 mentioned in bug 909059. But in you v9 build, the qemu-kvm still aborts but the bt log is different with before.

==============================
On qemu-kvm-rhev-0.12.1.2-2.361.el6:


Program received signal SIGPIPE, Broken pipe.
0x00007ffff773c4ed in write () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007ffff773c4ed in write () from /lib64/libpthread.so.0
#1  0x00007ffff7e5f1e9 in do_send (chr=0x7ffff86df410, fd=38, _buf=<value optimized out>, len1=127)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:572
#2  send_all (chr=0x7ffff86df410, fd=38, _buf=<value optimized out>, len1=127)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:627
#3  0x00007ffff7e5f354 in tcp_chr_write (chr=0x7ffff86df410, buf=<value optimized out>, len=127)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:2045
#4  0x00007ffff7de2d7c in monitor_flush (mon=0x7ffff8912010) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:270
#5  0x00007ffff7de2dda in monitor_puts (mon=0x7ffff8912010, str=0x7ffff9dde9be "")
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:289
#6  0x00007ffff7de2e49 in monitor_json_emitter (mon=0x7ffff8912010, data=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:389
#7  0x00007ffff7de2fd4 in monitor_protocol_emitter (mon=0x7ffff8912010, data=0x0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:425
#8  0x00007ffff7de31a8 in monitor_call_handler (mon=0x7ffff8912010, cmd=0x7ffff82bfc28, params=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4342
#9  0x00007ffff7de3da4 in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4952
#10 0x00007ffff7e3acd4 in json_message_process_token (lexer=0x7ffff890de80, token=0x7ffffe323320, 
    type=JSON_OPERATOR, x=97, y=5084) at /usr/src/debug/qemu-kvm-0.12.1.2/json-streamer.c:87
#11 0x00007ffff7e3a970 in json_lexer_feed_char (lexer=0x7ffff890de80, ch=125 '}', flush=false)
    at /usr/src/debug/qemu-kvm-0.12.1.2/json-lexer.c:303
#12 0x00007ffff7e3aab9 in json_lexer_feed (lexer=0x7ffff890de80, buffer=0x7fffffffb780 "}\352m\370\377\177", size=1)
    at /usr/src/debug/qemu-kvm-0.12.1.2/json-lexer.c:355
#13 0x00007ffff7de2b0e in monitor_control_read (opaque=<value optimized out>, buf=<value optimized out>, 
    size=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4973
#14 0x00007ffff7e5de4a in qemu_chr_read (opaque=0x7ffff86df410) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:180
#15 tcp_chr_read (opaque=0x7ffff86df410) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:2211
#16 0x00007ffff7ddb79f in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3975
#17 0x00007ffff7dfdf9a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#18 0x00007ffff7dde538 in main_loop (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#19 main (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6530
(gdb) q


=====================================
On your private qemu-kvm-0.12.1.2-2.361.el6_4.bz909059.v9.x86_64:

(1) first time:
(gdb) 
(gdb) bt
#0  0x00007ffff773e4ed in write () from /lib64/libpthread.so.0
#1  0x00007ffff74b9661 in ?? () from /lib64/libglib-2.0.so.0
#2  0x00007ffff74799b7 in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
#3  0x00007ffff7e5f41a in io_channel_send (fd=0x7ffff89a8f60, buf=0x7ffffa909e20, len=101731)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:726
#4  0x00007ffff7de524d in monitor_flush (mon=0x7ffff8917930) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:283
#5  0x00007ffff7de9b5c in monitor_unblocked (chan=<value optimized out>, cond=<value optimized out>, 
    opaque=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:269
#6  0x00007ffff7483f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#7  0x00007ffff7dddd0a in glib_select_poll (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3960
#8  main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4033
#9  0x00007ffff7e0048a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#10 0x00007ffff7de0a18 in main_loop (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4227
#11 main (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6570
(gdb) 


(2) the second and third time:

(qemu) **
ERROR:/builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:3910:glib_select_fill: assertion failed: (n_poll_fds <= ARRAY_SIZE(poll_fds))

Program received signal SIGABRT, Aborted.
0x00007ffff57408a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff57408a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff5742085 in abort () from /lib64/libc.so.6
#2  0x00007ffff74a9a0f in g_assertion_message () from /lib64/libglib-2.0.so.0
#3  0x00007ffff74a9fb0 in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
#4  0x00007ffff7ddd97e in glib_select_fill (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3910
#5  main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4004
#6  0x00007ffff7e0048a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#7  0x00007ffff7de0a18 in main_loop (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4227
#8  main (argc=72, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6570

Comment 29 Amit Shah 2013-04-29 07:36:57 UTC

Is the 1st time error also a SIGPIPE with the v9 build?  If so, it's something that should also happen upstream.  The second bug too should be reproducible upstream.  Can you test upstream qemu.git with this same testcase?

Comment 30 Qunfang Zhang 2013-05-03 08:34:50 UTC

(In reply to comment #29)
> Is the 1st time error also a SIGPIPE with the v9 build?  If so, it's
> something that should also happen upstream.  The second bug too should be
> reproducible upstream.  Can you test upstream qemu.git with this same
> testcase?

Ok, will do it and update here are get results. Sorry for reply late as just come back to office today.

Comment 31 Qunfang Zhang 2013-05-06 08:41:49 UTC

(In reply to comment #29)
> Is the 1st time error also a SIGPIPE with the v9 build? 
Sorry I can not rem it for sure and it's very hard to reproduce the 1st time error, always I will hit the 2nd error. I will update here after hit the first one.

> If so, it's
> something that should also happen upstream.  The second bug too should be
> reproducible upstream.  Can you test upstream qemu.git with this same
> testcase?

Just tested upstream qemu with the same test case. But qemu-kvm will hang instead of SIGPIPE or quit with some other error. I can not input any command in qemu monitor and qemu process consumes 100% cpu.

Comment 32 Amit Shah 2013-05-07 14:10:04 UTC

(In reply to comment #31)
> Just tested upstream qemu with the same test case. But qemu-kvm will hang
> instead of SIGPIPE or quit with some other error. I can not input any
> command in qemu monitor and qemu process consumes 100% cpu.

How easily reproducible is this bug on upstream as well as on v9?  What are the steps that you are using to reproduce?

Comment 33 Qunfang Zhang 2013-05-08 08:57:46 UTC

(In reply to comment #32)
> (In reply to comment #31)
> > Just tested upstream qemu with the same test case. But qemu-kvm will hang
> > instead of SIGPIPE or quit with some other error. I can not input any
> > command in qemu monitor and qemu process consumes 100% cpu.
> 
> How easily reproducible is this bug on upstream as well as on v9?  What are
> the steps that you are using to reproduce?

Hi, Amit
It's very easy to trigger the problem (nearly 100% in my test). But the result is a little different between v9 and upstream qemu.
For v9, I get the SIGPIPE in comment 28, but in upstream qemu, the qemu will hang instead of SIGPIPE or aborted or something else.

Steps:
1. Boot a guest with qmp, I used the same command line in comment 0.

2. Keep send QMP command in a loop. In bug description, it used live snapshot command but as in the v9 build, the live snapshot feature is not enabled, so I just test other command as below:

[root@localhost home]# cat script.sh 
#!/bin/bash
# some simply group snapshot stress testing

let i=0
exec 3<>/dev/tcp/localhost/4444
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 100000000000 ]
do
    echo -e " {'execute':'drive_add', 'arguments': {'file':'/home/disk.qcow2','format':'qcow2','id':'test30'}} " >&3
    echo -e "  {'execute':'device_add','arguments':{'driver':'virtio-blk-pci','drive':'test30','id':'test30'}} " >&3
#    sleep 0.1
    echo -e " {'execute':'device_del','arguments':{'id':'test30'}} " >&3
#    sleep 0.1
    read response <&3
    echo "$i: $response"
    let i=$i+1
done

Comment 34 Amit Shah 2013-05-09 04:19:06 UTC

Thanks.  One last question: does this reproduce with the current RHEL6 packages?  This bug is marked for RHEL7.  If it reproduces with RHEL6, we will need to clone for RHEL6 as well.

Comment 35 Qunfang Zhang 2013-05-09 05:13:44 UTC

(In reply to comment #34)
> Thanks.  One last question: does this reproduce with the current RHEL6
> packages?  This bug is marked for RHEL7.  If it reproduces with RHEL6, we
> will need to clone for RHEL6 as well.

Hi,Amit
This issue also can be reproduced on the latest qemu-kvm-366. And this bug itself is filed against rhel6 originally and moved to rhel7 component. So do we plan to fix this on rhel6? If yes, we could clone this to rhel6.5 and I will paste the bt log.

Thanks,
Qunfang

Comment 36 Luiz Capitulino 2013-05-09 12:45:19 UTC

Amit, this bug was originally opened against RHEL6 (see comment 0 to comment 21). This was moved to RHEL7.0 because we did not expect to backport the chardev fixes to RHEL6. Also, IMHO, the severity of this bug is low to medium (see comment 14).

Comment 37 Amit Shah 2013-05-09 14:02:47 UTC

(In reply to comment #36)
> Amit, this bug was originally opened against RHEL6 (see comment 0 to comment
> 21). This was moved to RHEL7.0 because we did not expect to backport the
> chardev fixes to RHEL6.

Of course, I'm aware of it.  It was dismissed then since it was dependent on a RHEL-only commit.  Now that upstream behaves similarly, we will have to resolve tihs problem upstream and for RHEL7.  And if the fix isn't intrusive, getting it in RHEL6 should be explored.

Comment 38 Luiz Capitulino 2013-05-09 14:38:58 UTC

I agree. But just to state the obvious again, the upstream bug could be a different issue that behaves similarly to the RHEL6 bug. I remember quite well that this bug would go way when commit ba75551d was reverted.

But yes, we have to track down the upstream issue first.

Comment 39 Qunfang Zhang 2013-05-29 08:17:28 UTC

Re-test on qemu-kvm-rhev-0.12.1.2-2.371.el6.x86_64 and paste the log here.

(I tested rhel6.5 version to compare it with bug 882078 to see if the issue are the same.)

Program received signal SIGPIPE, Broken pipe.
0x00007ffff77384ed in write () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007ffff77384ed in write () from /lib64/libpthread.so.0
#1  0x00007ffff74b3661 in ?? () from /lib64/libglib-2.0.so.0
#2  0x00007ffff74739b7 in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
#3  0x00007ffff7e5c6fa in io_channel_send (fd=0x7ffff8a70b50, buf=0x7fff5da3a3d0, len=16) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:726
#4  0x00007ffff7ddfb6d in monitor_flush (mon=0x7ffff877bd80) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:283
#5  0x00007ffff7ddfcd4 in monitor_puts (mon=0x7ffff877bd80, str=0x7fff5da3a2ef "") at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:314
#6  0x00007ffff7ddfd19 in monitor_json_emitter (mon=0x7ffff877bd80, data=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:416
#7  0x00007ffff7ddfe88 in monitor_protocol_emitter (mon=0x7ffff877bd80, data=0x0) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:452
#8  0x00007ffff7de0050 in monitor_call_handler (mon=0x7ffff877bd80, cmd=0x7ffff82bf0d8, params=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4372
#9  0x00007ffff7de0c34 in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4982
#10 0x00007ffff7e396f4 in json_message_process_token (lexer=0x7ffff878be20, token=0x7ffffbd73100, type=JSON_OPERATOR, x=105, y=108)
    at /usr/src/debug/qemu-kvm-0.12.1.2/json-streamer.c:87
#11 0x00007ffff7e39390 in json_lexer_feed_char (lexer=0x7ffff878be20, ch=125 '}', flush=false) at /usr/src/debug/qemu-kvm-0.12.1.2/json-lexer.c:303
#12 0x00007ffff7e394d9 in json_lexer_feed (lexer=0x7ffff878be20, buffer=0x7fffffffbb30 "}\273\377\377\377\177", size=1)
    at /usr/src/debug/qemu-kvm-0.12.1.2/json-lexer.c:355
#13 0x00007ffff7ddf8db in monitor_control_read (opaque=<value optimized out>, buf=<value optimized out>, size=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:5003
#14 0x00007ffff7e5cdea in qemu_chr_be_write (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7ffff86e00d0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:191
#15 tcp_chr_read (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7ffff86e00d0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:2349
#16 0x00007ffff747df0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#17 0x00007ffff7dd858a in glib_select_poll (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3967
#18 main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4040
#19 0x00007ffff7dfadca in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#20 0x00007ffff7ddb2c8 in main_loop (argc=44, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#21 main (argc=44, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6590
(gdb)

Comment 40 Qunfang Zhang 2013-05-29 08:53:45 UTC

Re-test with the latest upstream, paste the log here:

(qemu) 
Program received signal SIGPIPE, Broken pipe.
0x00007ffff67cb4ed in write () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff67cb4ed in write () from /lib64/libpthread.so.0
#1  0x00007ffff736d661 in ?? () from /lib64/libglib-2.0.so.0
#2  0x00007ffff732d9b7 in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
#3  0x00007ffff7dcc251 in io_channel_send (fd=0x7ffff8ba2920, buf=0x7ffff9100c50, len=95) at qemu-char.c:744
#4  0x00007ffff7e72449 in monitor_flush (mon=0x7ffff8c1d520) at /home/qemu/monitor.c:284
#5  0x00007ffff7e725e4 in monitor_puts (mon=0x7ffff8c1d520, str=0x7ffff9100c1e "") at /home/qemu/monitor.c:315
#6  0x00007ffff7e72643 in monitor_json_emitter (mon=0x7ffff8c1d520, data=<value optimized out>) at /home/qemu/monitor.c:407
#7  0x00007ffff7e72bdf in monitor_protocol_emitter (mon=0x7ffff8c1d520, data=<value optimized out>) at /home/qemu/monitor.c:451
#8  0x00007ffff7e72efc in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>) at /home/qemu/monitor.c:4571
#9  0x00007ffff7f2019c in json_message_process_token (lexer=0x7ffff8bfd1b0, token=0x7ffff90eb560, type=JSON_OPERATOR, x=106, y=3504)
    at qobject/json-streamer.c:87
#10 0x00007ffff7f32828 in json_lexer_feed_char (lexer=0x7ffff8bfd1b0, ch=125 '}', flush=false) at qobject/json-lexer.c:303
#11 0x00007ffff7f329d9 in json_lexer_feed (lexer=0x7ffff8bfd1b0, buffer=0x7fffffffccf0 "}1^\367\377\177", size=1) at qobject/json-lexer.c:356
#12 0x00007ffff7e713ab in monitor_control_read (opaque=<value optimized out>, buf=<value optimized out>, size=<value optimized out>)
    at /home/qemu/monitor.c:4586
#13 0x00007ffff7dcc5cb in tcp_chr_read (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7ffff8ba25d0) at qemu-char.c:2551
#14 0x00007ffff7337f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#15 0x00007ffff7da2ce9 in glib_pollfds_poll (nonblocking=<value optimized out>) at main-loop.c:187
#16 os_host_main_loop_wait (nonblocking=<value optimized out>) at main-loop.c:232
#17 main_loop_wait (nonblocking=<value optimized out>) at main-loop.c:464
#18 0x00007ffff7e0df75 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at vl.c:2029
#19 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at vl.c:4421
(gdb)

Comment 41 Paolo Bonzini 2013-05-29 09:00:27 UTC

If we exit due to a SIGPIPE, the simplest fix is just to change the SIGPIPE action to SIG_IGN.  Handling the resulting EPIPE should be easier, QEMU might even be doing the right thing already.

Comment 42 Amit Shah 2013-05-29 09:03:37 UTC

Thank you, the bt shows upstream and RHEL behaviours are the same, and that this bug is different from bug 882078.

Comment 43 Luiz Capitulino 2013-05-29 15:25:11 UTC

I'm not it's worth noting this (and maybe this doesn't even matter), but maybe the new control flow code did fix the original bug but introduced another one altogether.

Comment 44 Paolo Bonzini 2013-09-03 15:18:47 UTC

Actually the SIGPIPE is already ignored.  The kernel sends it, so gdb shows it, but it doesn't affect the program.

So, when running with gdb, make sure you use "handle SIGPIPE nostop".  If you see

   Program received signal SIGPIPE, Broken pipe.

that's fine.  Only if it says

   Program received signal SIGPIPE, Broken pipe.
   [Thread 0x7fffe22c0700 (LWP 8834) exited]
   Program terminated with signal SIGPIPE, Broken pipe.
   The program no longer exists.

Then you have a bug.

Comment 45 juzhang 2013-09-04 06:27:55 UTC

Tried on RHEL6.5 host and RHEL7.0 host w/o gdb both by using comment0 steps.


For RHEL6.5
qemu version
qemu-kvm-rhev-0.12.1.2-2.398.el6.x86_64

Resutls:
Qemu-kvm hang, top show "%cpu 190.2" of qemu-kvm process on host.

For RHEL7.0
qemu version
qemu-kvm-1.5.3-2.el7.x86_64

Results:
Qemu-kvm works well.

Comment 46 Amit Shah 2013-09-10 08:20:53 UTC

Thanks.

Closing this one.  Please open another one for rhel6 hosts freezing / using too much cpu.

Note You need to log in before you can comment on or make changes to this bug.