Bug 673459

Summary: virtio_console driver never returns from selecting for write when the queue is full
Product: Red Hat Enterprise Linux 5 Reporter: Amit Shah <amit.shah>
Component: kernelAssignee: Amit Shah <amit.shah>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.7CC: bcao, dhoward, hdegoede, mjenner, qcai, virt-maint
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Using a virtio serial port from an application, filling it until the write command returns -EAGAIN and then executing a select command for the write command caused the select command to not return any values, when using the virtio serial port in a non-blocking mode. When used in a blocking mode, the write command waited until the host indicated it used up the buffers. This was due to the fact that the poll operation waited for the port->waitqueue pointer, however, nothing woke the waitqueue when there was room again in the queue. With this update, the queue is woken via host notifications so that buffers consumed by the host can be reclaimed, the queue freed, and the application write operations may proceed again.
Story Points: ---
Clone Of: 643750 Environment:
Last Closed: 2011-07-21 10:03:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 643750    
Bug Blocks: 580948, 673983    

Description Amit Shah 2011-01-28 09:59:24 UTC
Clone of RHEL6 bug
+++ This bug was initially created as a clone of Bug #643750 +++

When using a virtio serial port from an application and putting it in non blocking mode, then fulling it till write returns -EAGAIN and then doing a select
for write, the select will never returns.

The reason for this is that poll waits for port->waitqueue, but nothing
wakes the waitqueue when there is room again in the queue, quoting from virtio_console.c: init_vqs():

        io_callbacks[j] = in_intr;
        io_callbacks[j + 1] = NULL;

The fix is to simply define a callback for the j + 1 case, and make this wait the waitqueue all the other needed bits are already present.

--- Additional comment from hdegoede on 2010-10-19 03:57:30 EDT ---

Some notes my original description of this problem comes from reading the code, not from hitting this in practice. Amit Shah has run some tests and cannot re-create the problem which one would expect up on reading the code.

We've discussed this and decided to keep this bug open for further investigation later to see if the waitqueue in question is somehow actually woken when room becomes available in the out_vq, or if things currently happen to work because of some side-effect somewhere else.

--- Additional comment from amit.shah on 2011-01-28 04:56:09 EST ---

This is needed once the qemu can do nonblocking IO and flow control.  Those patches are scheduled for 6.1.  This patch needs to get to 5.7 and 5.6.z.

Comment 2 Amit Shah 2011-01-28 14:13:51 UTC
Testing notes:

With qemu-kvm with the fix for bug 588916, start a guest with:

-chardev socket,path=/tmp/foo,server,nowait,id=c0 -device virtio-serial -device virtserialport,chardev=c0

Then redirect the port to a file:

nc -U /tmp/foo > /tmp/guest-file

In the guest, transfer a big file (anything > 1G) to the virtio port:

cat /tmp/bigfile > /dev/vport0p1


In some cases, the guest command will never finish and the size of the host file will not increase beyond a particular number.

After the kernel with this bug solved is used, the 'cat' command in the guest will finish and the size of the file in the host will match the size of the file in the guest.

Comment 6 Jarod Wilson 2011-02-02 05:37:00 UTC
in kernel-2.6.18-242.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 8 Mike Cao 2011-02-16 08:27:53 UTC
Verified on qemu-kvm-0.12.1.2-2.144.el6.
guest kernel : kernel-2.6.18-243.el5

steps:
1.start VM with virtio-serial-port w/o -M parameter.
2.open the socket file on the host and not read it
eg:#cat open-socket 
#!/usr/bin/python
import os
import sys
import socket
import time

#fd = os.open(sys.argv[1], os.O_RDONLY)

s = socket.socket(socket.AF_UNIX)
s.connect(sys.argv[1])

while 1:
        # do nothing
        time.sleep(1)

#python open-socket /tmp/vport0
3.transfer a file whose size > 2G via virtio-serial
eg :#cat /tt > /dev/vport0p1

Actual Results:
qemu-kvm process does not freeze.

Based on above ,this issue has been fixed.
Change status to VERIFIED.

Comment 10 Martin Prpič 2011-07-13 20:22:20 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Using a virtio serial port from an application, filling it until the write command returns -EAGAIN and then executing a select command for the write command caused the select command to not return any values, when using the virtio serial port in a non-blocking mode. When used in a blocking mode, the write command waited until the host indicated it used up the buffers. This was due to the fact that the poll operation waited for the port->waitqueue pointer, however, nothing woke the waitqueue when there was room again in the queue. With this update, the queue is woken via host notifications so that buffers consumed by the host can be reclaimed, the queue freed, and the application write operations may proceed again.

Comment 11 errata-xmlrpc 2011-07-21 10:03:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html