RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 909059 - Switch to upstream solution for chardev flow control
Summary: Switch to upstream solution for chardev flow control
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: Amit Shah
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 953551
Blocks: 896690 870447 949182 949183 949312 964195 966306 967899 985334
TreeView+ depends on / blocked
 
Reported: 2013-02-08 06:33 UTC by Amit Shah
Modified: 2013-11-21 06:34 UTC (History)
20 users (show)

Fixed In Version: qemu-kvm-0.12.1.2-2.368.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 949182 949183 (view as bug list)
Environment:
Last Closed: 2013-11-21 06:34:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test steps of two issues (3.95 KB, application/octet-stream)
2013-03-27 07:35 UTC, FuXiangChun
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 588916 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 621484 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 702271 0 medium CLOSED Guest terminal returns directly in even number times w/o sending data out via virtio-serial-port 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 702611 0 medium CLOSED Some files were not integrated after transferring four files from host to guest 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 720535 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 729923 0 medium CLOSED [virtio-serial] First message is not delivered after connected to virtio-port 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 745758 0 high CLOSED Segmentation fault occurs after hot unplug virtio-serial-pci while virtio-serial-port in use 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 797854 0 low CLOSED host can't receive characters if disconnect to TCP socket then re-connect again 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 808295 0 medium CLOSED qemu-kvm segfaults under heavy QMP I/O 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 822386 0 high CLOSED qemu-kvm core dumps after virtio-blk hotplug-in/removed then stop/cont 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 839156 0 urgent CLOSED Fedora 16 and 17 guests hang during boot 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 880139 0 high CLOSED guest abort when transfer file with virtio-serial from guest to host 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 881522 0 medium CLOSED can not use virtio-serial after hotplug bus and port 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 882078 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 911571 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHSA-2013:1553 0 normal SHIPPED_LIVE Important: qemu-kvm security, bug fix, and enhancement update 2013-11-20 21:40:29 UTC


Description Amit Shah 2013-02-08 06:33:05 UTC
Description of problem:

For RHEL 6.0+, we are using a non-upstream version of chardev flow control patches.  Upstream may soon get a different set of patches that solve the problem using glib.  The intention is to replace the RHEL-only patches with the patches that get applied upstream.

The plan is to revert all the previous non-upstream patches in the qemu-char layer, and add upstream patches.  The flow control patches for virtio-serial-bus remain the same, and won't be reverted.

Testing:
  Functionality-wise there should be no difference.  The original patches were added as part of bug 588916 (flow control) and bug 621484 (-EPIPE reporting).  Those should remain fixed after patches for this bug are accepted.

A simple test to reproduce the issue (triggerable only upstream) is to start a guest with one virtio-serial port and a chardev, like:

./x86_64-softmmu/qemu-system-x86_64 -m 512 \
 /guests/f14-nolvm.qcow2 -snapshot -enable-kvm \
 -chardev socket,path=/tmp/foo,server,nowait,id=foo \
 -device virtio-serial -device virtserialport,chardev=foo,nr=2

Then, on the host, open the chardev but don't read from it.  e.g.

$ python
Python 2.7.3 (default, Aug  9 2012, 17:23:57) 
[GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> 
>>> sock = socket.socket(socket.AF_UNIX)
>>> sock.connect("/tmp/foo")
>>> 

and in the guest, write to the virtio-serial port, e.g.

# dd if=/dev/zero of=/dev/vport0p2

This will cause the guest to freeze using current upstream qemu.  Fedora and RHEL qemu have the (old) flow control patches, so the guest won't freeze.  After applying the new flow control patches too, the guest shouldn't freeze.

I can prepare two builds for QE to test this: one with all the old flow control patches reverted, so the problem can be reproduced as mentioned above.  Second with the new flow control patches applied, so the behaviour is back to normal.

QE should also test multiple chardev backends, like unix, udp, tcp, pty and fd.

Risks:
  Reverting a big patchset and pushing new patches comes with a set of risks, esp. in a stable series.  However, there are two factors which lower the risk a lot: 1) we are ripping out non-upstream code, and replacing with code that will be upstream.  This means the code that we will now push in will get much wider testing by the qemu community.  2) The new patches utilize glib library code, which is already widely deployed and tested.

The benefit of switching to glib is it uses poll() instead of select(), which is more flexible, and will give us better error-handling in the future.

Merging this series as soon as the new patches are committed upstream will also give us more testing time on RHEL.  Almost every invocation of qemu has chardevs.  They're a crucial subsystem, and any regression there will be caught by other tests as well.

Other bugs:
  There are a few other chardev-related RHEL-only bugs which could have been caused due to the existing non-upstream patches.  Some of them could get resolved by this switch.  In particular, see bug 729923, bug 702271, bug 808295, bug 822386, bug 822078.

Comment 1 Amit Shah 2013-02-08 06:38:46 UTC
(In reply to comment #0)
> In particular, see bug 729923, bug 702271, bug 808295, bug 822386, bug 822078.

Last should've been bug 882078.

Comment 3 Markus Armbruster 2013-02-08 08:37:42 UTC
I believe the non-upstream flow control patches we currently have in RHEL-6 are flawed, and cause at least some of the "other bugs" Amit listed.  Fixing them looks difficult.  Amit is right in that replacing our flow control patches by a backport of upstream's patches carries risk.  The alternative is attempting to fix the flaws in our non-upstream patches, which is a different risk, and one I like a whole lot less.

Let's attempt to switch to upstream flow control, and see how we do in testing, and how much it helps with the "other bugs".

Comment 4 Amit Shah 2013-02-12 06:31:11 UTC
Upstream status (along with current known bugs) is tracked in http://wiki.qemu.org/Features/ChardevFlowControl

Comment 6 Amit Shah 2013-03-12 16:44:16 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > In particular, see bug 729923, bug 702271, bug 808295, bug 822386, bug 822078.
> 
> Last should've been bug 882078.

I've put all these in the 'see also' field, so that it's easier to manage related bugs.

Comment 7 Amit Shah 2013-03-12 16:46:36 UTC
The new patches have now been merged upstream.  I'll let them sit there for a while, so that they receive testing and any initial bugs get shaken out.  I will then propose a backport to RHEL6.

Comment 11 FuXiangChun 2013-03-27 07:35:25 UTC
Created attachment 716914 [details]
test steps of two issues

For the third issue, official build(qemu-kvm-0.12.1.2-2.358.el6.x86_64) also hit the same issue. so I open a new bug(928207) to track it.

Comment 13 Amit Shah 2013-03-28 11:39:03 UTC
(In reply to comment #10)
> > Found three new issues.
> > 1.hot unplug virtio-serial bus cause guest kernel panic during transferring
> > data from guest to host.

This sounds like a guest bug, should not be due to this build.  Please file a new bug.

> > 2. transfer data from guest to host,then host will get wrong md5sum values
> >    (BTW, if transfer data from host to guest,then md5sum values is correct)

Please file a new bug for this, also note that you got this with the scratch build I gave in the bug report (make that bug depend on this one).  There are some older bug reports about similar as well, so looks like it's not new in this build.

> > 3. transfer data with two ports at the same time from guest to host,guest
> > hang and call trace.
> > use the following script to transfer data
> > while true;do echo abc >/dev/vport0p1;done
> > while true;do echo edf >/dev/vport0p2;done

This is also a guest bug, details in bug 928207.

Comment 16 Amit Shah 2013-04-06 21:04:33 UTC
Couple more test cases.  Please add these to the test plan as well.

1. Migrate guest while transferring data over spice.  Ensure everything is fine.

2. Induce throttling by opening host-side chardev but not reading from it.  This can be achieved by the simple steps mentioned in comment 0 using python.  Send data from guest, but don't read on host.  Now, migrate the guest.  On the destination, use these two scenarios:

2.a) Do not connect host-side chardev on destination before migration completes.  Ensure qemu works fine after migration.  Then connect chardev and read data from port.  Data sent by guest before migration should be obtained fine.

2.b) Connect host-side chardev on destination before migration is started.  Ensure data sent from guest is read fine on host when migration completes.

Comment 18 juzhang 2013-04-07 02:21:17 UTC
(In reply to comment #16)
> Couple more test cases.  Please add these to the test plan as well.
> 
> 1. Migrate guest while transferring data over spice.  Ensure everything is
> fine.
> 
> 2. Induce throttling by opening host-side chardev but not reading from it. 
> This can be achieved by the simple steps mentioned in comment 0 using
> python.  Send data from guest, but don't read on host.  Now, migrate the
> guest.  On the destination, use these two scenarios:
> 
> 2.a) Do not connect host-side chardev on destination before migration
> completes.  Ensure qemu works fine after migration.  Then connect chardev
> and read data from port.  Data sent by guest before migration should be
> obtained fine.
> 
> 2.b) Connect host-side chardev on destination before migration is started. 
> Ensure data sent from guest is read fine on host when migration completes.

Thank a lot for your suggestions first. You mean the above test scenarios should be added in rhel6.5 and rhel7 test plan both, right?

Best Regards,
Junyi

Comment 19 Amit Shah 2013-04-07 06:41:20 UTC
(In reply to comment #18)
> Thank a lot for your suggestions first. You mean the above test scenarios
> should be added in rhel6.5 and rhel7 test plan both, right?

Right.

Comment 21 Amit Shah 2013-04-08 16:47:45 UTC
One more test case to be added, similar to the previous one:

Induce throttling by opening host-side chardev but not reading from it.  This can be achieved by the simple steps mentioned in comment 0 using python.  Send data from guest, but don't read on host.  Now, hot-unplug the port.  Also attempt migration after unplug.

Comment 25 Qunfang Zhang 2013-04-11 09:09:37 UTC
Hi, Amit
I found an aborted issue in your v4 build.

Boot guest with following command line and check "info qtree" in qemu monitor, qemu gets aborted. Maybe can not reproduce at the first time but repeat input "info qtree", will be aborted at the second time.

CLI:
(gdb)  r -cpu SandyBridge -M rhel6.4.0 -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/RHEL-Server-6.4-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=port1,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6



qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/monitor.c:4334: handler_audit: Assertion `!monitor_has_error(mon)' failed.

Program received signal SIGABRT, Aborted.


(gdb) 
#0  0x00007ffff57418a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff5743085 in abort () from /lib64/libc.so.6
#2  0x00007ffff573aa1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff573aae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff7de65d5 in handler_audit (mon=0x7ffff88fe010, cmd=0x7ffff82bf730, params=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4334
#5  monitor_call_handler (mon=0x7ffff88fe010, cmd=0x7ffff82bf730, params=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4349
#6  0x00007ffff7deb98f in handle_user_command (mon=0x7ffff88fe010, cmdline=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4385
#7  0x00007ffff7debaca in monitor_command_cb (mon=0x7ffff88fe010, cmdline=<value optimized out>, opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:5020
#8  0x00007ffff7e4987d in readline_handle_byte (rs=0x7ffff9d35a20, ch=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/readline.c:369
#9  0x00007ffff7debcf0 in monitor_read (opaque=<value optimized out>, buf=0x7fffffffb6c0 "\r", size=1) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:5006
#10 0x00007ffff7e5fce6 in qemu_chr_be_write (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7ffff86e17d0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:164
#11 fd_chr_read (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7ffff86e17d0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:747
#12 0x00007ffff7484f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#13 0x00007ffff7ddeb6a in glib_select_poll (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3960
#14 main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4033
#15 0x00007ffff7e0121a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#16 0x00007ffff7de1848 in main_loop (argc=70, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4227
#17 main (argc=70, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6565
(gdb) 

======================
Hi, Amit
I re-test with official qemu-kvm-360 build can not reproduce this issue. And also try the following scenarios, have no problem too.

(1) Remove the following chardev and serial port, only leave 1 serial port in the command line. ==> Can not reproduce.

-chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2

(2) Remove the sound device. ==> Can not reproduce.

Please help have a check, thanks!

Regards,
Qunfang

Comment 27 Paolo Bonzini 2013-04-18 12:39:51 UTC
The current upstream patches cause a deadlock.

https://bugzilla.gnome.org/show_bug.cgi?id=626702

Comment 6 of the latter bug is exactly this scenario.

Comment 29 Amit Shah 2013-04-23 13:30:17 UTC
One more testcase to be added:

If a VM is started with the rhel6.0.0 machine type, flow control code should be disabled.  So if the host-side chardev is not reading while the guest continues to send data, the guest should freeze.  The guest will unfreeze only when the host-side chardev is read from.  This should be the desired behaviour with the older machine type.

Comment 33 Amit Shah 2013-04-24 10:10:41 UTC
One more testcase from upstream.  From Jan Kiszka's message:

"It's trivial to reproduce: qemu-system-x86_64 -serial stdio -S, and then
hit a key twice on that console."

This problem was introduced by some patches in this rework, and was resolved by some patches in v7+ of the builds.  Please add this testcase to our tests so we don't regress.

Comment 34 Qunfang Zhang 2013-04-27 07:24:40 UTC
Hi, Amit
I re-tested the bugs in 'see also' list and summarize the results here:

(1)For closed bugs regression check:
Bug 588916  - qemu char fixes for nonblocking writes, virtio-console flow control 
    (passed)
Bug 621484 - Broken pipe when working with unix socket chardev
    (reproduce again with comment 25, need to file new bug once official build comes)
Bug 745758  - Segmentation fault occurs after hot unplug virtio-serial-pci while virtio-serial-port in use
    (passed the scenario in the bug, but guest hang after interrupt the writing operation to serial port. create new bug 956637)
Bug 839156  - Fedora 16 and 17 guests hang during boot
    (passed)

(2) opening issues:
Bug 797854  - host can't receive characters if disconnect to TCP socket then re-connect again 
  (still can reproduce,bug is moved to rhel7 now)

Bug 882078  - Restart libvirt during snapshot-create causes VM fails to resume
 (60%~80% reproduced in v9)

Bug 729923  - [virtio-serial] First message is not delivered after connected to virtio-port 
(Should be same issue with 702271, needinfo reporter to have a try to make sure the scenario is passed)

Bug 702271  - Guest terminal returns directly in even number times w/o sending data out via virtio-serial-port
(now it has same issue of 621484)

Bug 808295  - qemu-kvm segfaults under heavy QMP I/O 
 (still failed with a difference bt log in v9, updated details to bug 808295)

Bug 822386  - qemu-kvm core dumps after virtio-blk hotplug-in/removed then stop/cont
(fail, both reproduce on official qemu-kvm-361 and v9)

Bug 911571 - [Hitachi 6.5 FEAT] virtio-trace: Named-pipe non-blocking 	
(Verified pass)

Bug 720535 (virtio serial) Guest aborted when transferring data from guest to host
(using bug 880139 to track)

Bug 880139  - guest abort when transfer file with virtio-serial from guest to host
(Reproduced on both official build and v9)


Thanks,
Qunfang

Comment 35 Qunfang Zhang 2013-04-27 07:27:24 UTC
(In reply to comment #34)
> Hi, Amit
> I re-tested the bugs in 'see also' list and summarize the results here:
> 
Re-tested them with v9 build in comment 30.

Comment 56 errata-xmlrpc 2013-11-21 06:34:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html


Note You need to log in before you can comment on or make changes to this bug.