Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 643751 - writing to a virtio serial port while no one is listening on the host side hangs the guest
writing to a virtio serial port while no one is listening on the host side ha...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: 6.1
Assigned To: Amit Shah
Red Hat Kernel QE team
: ZStream
Depends On:
Blocks: 580954 644735 678562
  Show dependency treegraph
 
Reported: 2010-10-17 16:06 EDT by Hans de Goede
Modified: 2013-01-10 22:24 EST (History)
7 users (show)

See Also:
Fixed In Version: kernel-2.6.32-91.el6
Doc Type: Bug Fix
Doc Text:
If a host was slow in reading data or did not read data at all, blocking write() calls not only blocked the program that called the write() call but also the entire guest. This was caused by the write() calls waiting until an acknowledgment that the data consumed was received from the host. With this update, write() calls no longer wait for such acknowledgment: control is immediately returned to the user space application. This ensures that even if the host is busy processing other data or is not consuming data at all, the guest is not blocked.
Story Points: ---
Clone Of:
: 644735 (view as bug list)
Environment:
Last Closed: 2011-05-23 16:26:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
serial-console.py (195 bytes, text/plain)
2011-04-18 23:27 EDT, dawu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 07:58:07 EDT

  None (edit)
Description Hans de Goede 2010-10-17 16:06:43 EDT
The problem is this "beauty" in virtio_console.c: send_buf()

        /*
         * Wait till the host acknowledges it pushed out the data we
         * sent.  This is done for ports in blocking mode or for data
         * from the hvc_console; the tty operations are performed with
         * spinlocks held so we can't sleep here.
         */
        while (!virtqueue_get_buf(out_vq, &len))
                cpu_relax();

I see a number of possible (partial) solutions here:

1) the code says it is using cpu_relux rather then sleep, because the tty
functions are called with a spinlock held. but fops-write is not called with
any spinlock held, I believe. How about a parameter to send_buf, called
"may_sleep" and then use sleep rather then relax if may_sleep is true?

2) The waiting is done for: "This is done for ports in blocking mode or for
data from the hvc_console". I wonder why the waiting is done in blocking mode
too, I guess this is some sort of workaround for the missing waitqueue wakeups,
see bug 643750, with those waitqueue wakeups added I would think / expect the
waiting for the host acknowledge is only needed for tty usage, and that we
could skip the wait entirely (making 1 mute) when called from fops_write ?

I know that work is being done for a more permanent solution, but if the above 2 are possible this would be a nice way to lessen the cases where this problem happens, which will also help while running in VM's without the new more permanent fix.

Note I believe this bug should be assigned to Amit Shah (but I'm, not sure if
it is ok to do this myself wrt the kernel teams procedures).
Comment 2 Hans de Goede 2010-10-19 03:58:10 EDT
Amit has posted a patch for this, assigning to Amit.
Comment 3 Amit Shah 2010-10-20 02:07:51 EDT
To test:

- open guest virtio-console port
- open host virtio-console port

Without reading from the host side, keep writing to the guest port.  After a
few writes, the guest will freeze.

After applying the patch that fixes this, the guest will not freeze, but the
application writing data to the guest port will wait till the host side data is
read off.

This test is the test_blocking_write() in test-virtserial.git:

http://fedorapeople.org/gitweb?p=amitshah/public_git/test-virtserial.git;a=commitdiff;h=e5cbe2be47ca7cf5fce86da694869fc8e922d41c
Comment 4 RHEL Product and Program Management 2010-10-20 08:10:05 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 5 Amit Shah 2010-10-21 03:45:56 EDT
Additional testing note: the effect of this patch will be visible with host-side qemu modifications which aren't yet in RHEL6.  When testing, ask me for a brew build.
Comment 6 Aristeu Rozanski 2010-12-15 11:05:30 EST
Patch(es) available on kernel-2.6.32-91.el6
Comment 11 juzhang 2011-03-23 01:48:11 EDT
(In reply to comment #3)
> To test:
> 
> - open guest virtio-console port
> - open host virtio-console port
> 
> Without reading from the host side, keep writing to the guest port.  After a
> few writes, the guest will freeze.
Reproduced on kernel-2.6.32-90.el6
After step3,30 seconds later,the guest is hang.

Verified  on kernel-2.6.32-118.el6 with qemu-kvm-0.12.1.2-2.151.el6.x86_64

Steps:
1. boot guest.
#/usr/libexec/qemu-kvm -m 2G -smp 4 -drive file=/root/zhangjunyi/rhel6.1-ide.qcow2,if=none,id=test,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test -cpu qemu64,+sse2,+x2apic -boot c -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:11:22:45:66:94 -vnc :10  -device virtio-serial-pci,id=virtio-serial0,max_ports=31 -chardev socket,id=channel0,path=/var/zhangjunyi0,server,nowait -device virtserialport,bus=virtio-serial0.0,chardev=channel0,name=org.port.0,id=port1 -serial stdio -qmp tcp:0:4444,server,nowait

2.in guest,write big file
cat partaa > /dev/vport0p1

3.in host.
just open  virtio-console without portreading.

results:
10 mins later,guest still well.
Comment 12 juzhang 2011-03-23 01:49:02 EDT
According to comment11,set this issue as verified.
Comment 13 Martin Prpič 2011-04-12 08:42:10 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
If a host was slow in reading data or did not read data at all, blocking write() calls not only blocked the program that called the write() call but also the entire guest. This was caused by the write() calls waiting until an acknowledgment that the data consumed was received from the host. With this update, write() calls no longer wait for such acknowledgment: control is immediately returned to the user space application. This ensures that even if the host is busy processing other data or is not consuming data at all, the guest is not blocked.
Comment 14 dawu 2011-04-18 23:25:47 EDT
Verified  on kernel-2.6.32-133.el6 with qemu-kvm-0.12.1.2-2.158.el6.x86_64
this issue does not reproduce, 10 mins later,guest still well,following is the details:

Steps:
1. boot guest.
/usr/libexec/qemu-kvm -m 2G -smp 4 -drive file=RHEL-Server-6.1-64-virtio.qcow2,if=none,id=test,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test -cpu qemu64,+sse2,+x2apic -boot c -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:11:22:45:66:94 -vnc :1  -device virtio-serial-pci,id=virtio-serial0,max_ports=31 -chardev socket,id=channel0,path=/var/zhangjunyi0,server,nowait -device virtserialport,bus=virtio-serial0.0,chardev=channel0,name=org.port.0,id=port1-serial -qmp tcp:0:4444,server,nowait

2.in guest,write big file
cat partaa > /dev/vport0p1

3.in host.
just open  virtio-console without portreading. (please refer to the attached python script of "serial-console.py")
#python serial-console.py /var/zhangjunyi0

results:
10 mins later,guest still well
Comment 15 dawu 2011-04-18 23:27:28 EDT
Created attachment 493062 [details]
serial-console.py
Comment 16 errata-xmlrpc 2011-05-23 16:26:31 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.