Bug 1352977 - virtio-serial devices in qemu-kvm break when guests write on channel with host side closed
Summary: virtio-serial devices in qemu-kvm break when guests write on channel with hos...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 25
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-05 16:19 UTC by Nat Meo
Modified: 2017-04-19 09:24 UTC (History)
12 users (show)

Fixed In Version: qemu-2.7.1-6.fc25
Clone Of:
Environment:
Last Closed: 2017-04-19 09:24:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Nat Meo 2016-07-05 16:19:20 UTC
Description of problem:
There is a problem when using virtio-serial devices in qemu-kvm where writing to a channel from the guest side when the host side has been closed will result in the host subsequently not being able to read any data after the host side has been reopened.

Version-Release number of selected component (if applicable):
qemu-kvm-2.6.0-4.fc24.x86_64
qemu-common-2.6.0-4.fc24.x86_64
qemu-system-x86-2.6.0-4.fc24.x86_64
libvirt-1.3.3.1-4.fc24.x86_64
libvirt-daemon-1.3.3.1-4.fc24.x86_64
libvirt-daemon-driver-qemu-1.3.3.1-4.fc24.x86_64
libvirt-daemon-kvm-1.3.3.1-4.fc24.x86_64
virt-manager-1.4.0-3.fc24.noarch
virt-manager-common-1.4.0-3.fc24.noarch

How reproducible:
100%

Steps to Reproduce:
1. Using virt-manager or some other means, install you favorite Linux distribution that has virtio-serial support in the kernel.
2. Add the following to the libvirt XML for the guest that you created:

    <channel type='pty'>
      <target type='virtio' name='test'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>

3. Start up the guest and while it is booting do a virsh dumpxml to determine the /dev/pty/? device attached to the host side of the channel.
4. In a terminal on the host, execute the command "cat /dev/pty/?" to capture output.
5. In a terminal on the guest, execute the command "echo test1 > /dev/virtio-ports/test".
6. Observe that the message "test1" is displayed on the host side.
7. Press CTRL-C in the terminal on the host to kill the "cat" command.
8. In the terminal on the guest, execute the command "echo test2 > /dev/virtio-ports/test".
9. In a terminal on the host, execute the command "cat /dev/pty/?" to capture output again.
10. Observe that "test2" is not displayed in the output on the host side.
11. In the terminal on the guest, execute the command "echo test3 > /dev/virtio-ports/test".
12. Observe that "test3" is also not displayed in the output on the host side.

Actual results:
"test2" and "test3" are not displayed on the host side.

Expected results:
At least "test3" should be displayed on the host side. Seeing "test2" would also be good since it would mean messages would not be dropped when the host side is not being read from.

Additional info:
If you skip the step where "test2" is sent when the host side is closed then you will see "test3" displayed successfully. It seems that if any write occurs on the guest side while the host side has been closed, then it breaks the virtio-serial channel in a manner that nothing else can ever be read again. The only way to get it working again is to completely shutdown the guest and restart it. The operating system running inside the guest does not appear to matter as this happens with Ubuntu, Windows, CentOS, etc, so this seems to be a problem with qemu-kvm.

Comment 1 Nat Meo 2016-07-05 16:25:06 UTC
This also appears to be a problem on Fedora 22 and 23.

Comment 2 Cole Robinson 2016-07-13 14:37:04 UTC
Amit, any ideas?

Comment 3 Amit Shah 2016-07-26 06:30:51 UTC
Thanks for the detailed report.

It's been known for a while that host-side disconnects aren't registered immediately, and qemu can lose some data.  If you try writing to the port from the guest in a loop, the writes will resume at some time, when qemu realises the host side was closed, and then it does reopen the port after that.  If writing several bytes from the guest doesn't change this, it's likely a new bug.

Comment 4 Nat Meo 2016-07-26 13:11:37 UTC
Based on this I ran a simple test. I did a fresh install on a host of Fedora 24 and inside that host I did a fresh install of a CentOS 7 guest. I then configured a virtio-serial device in the same manner described above and performed a "cat" on it from the host side and left it running. Inside the guest I created the following shell script:

#!/bin/sh
while :
do
    echo Testing > /dev/virtio-ports/test
    sleep 1
done

I then ran this script on the guest. I let it run for 10 seconds and saw that there were 10 "Testing" messages displayed on the host side. I then killed the "cat" command and waited another 10 seconds. I then executed "cat" again on the same pts device and saw no output. I waited five minutes and still nothing showed up on the host side despite the script still running in a loop on the guest side.

Unless there is some large amount of data or long timeout required for the host side to realize it has been closed it appears that the virtio-serial device has permanently ceased receiving data. If I appear to be doing anything incorrectly or there is some special configuration required that I am not aware of then please let me know.

Comment 5 Amit Shah 2016-07-27 06:03:25 UTC
Thanks.  Can you try a different backend on the host - tcp or unix?  Then use netcat or socat to see output from the guest - this will help in narrowing down the problem - whether it's in the pts backend, or all the backends.

Thanks,

Comment 6 Nat Meo 2016-07-27 11:48:48 UTC
I added the following to my libvirt XML:

    <channel type='unix'>
      <source mode='bind' path='/tmp/virtio-bind'/>
      <target type='virtio' name='testing' state='connected'/>
    </channel>

I then executed "socat /tmp/virtio-bind STDOUT" on the host. On the guest I ran an echo command and the message displayed on the host side. I then killed the socat command and did another echo on the guest. After the echo command was executed, I ran socat on the host again and it did not display anything. I then proceeded to perform another echo on the guest side and saw the third message on the host side. It appears using a unix channel type as opposed to a pty type works. This bug seems to be isolated to pty devices.

Comment 7 Cole Robinson 2017-03-20 15:43:55 UTC
I reproduced (using a sleep .1 loop makes it more reliable). qemu.git master is working though, and looks like the fix is:

commit 1c64fdbc8177058802df205f5d7cd65edafa59a8
Author: Ed Swierk <eswierk>
Date:   Tue Jan 31 05:45:29 2017 -0800

    char: drop data written to a disconnected pty
    
    When a serial port writes data to a pty that's disconnected, drop the
    data and return the length dropped. This avoids triggering pointless
    retries in callers like the 16550A serial_xmit(), and causes
    qemu_chr_fe_write() to write all data to the log file, rather than
    logging only while a pty client like virsh console happens to be
    connected.
    
    Signed-off-by: Ed Swierk <eswierk>
    Message-Id: <1485870329-79428-1-git-send-email-eswierk>
    Signed-off-by: Paolo Bonzini <pbonzini>


I'll let that get some testing with qemu 2.9 in f26, and after that I'll backport to f25

Comment 8 Fedora Update System 2017-04-15 18:23:26 UTC
qemu-2.7.1-6.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-01925dba3c

Comment 9 Fedora Update System 2017-04-16 21:23:17 UTC
qemu-2.7.1-6.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-01925dba3c

Comment 10 Fedora Update System 2017-04-19 09:24:31 UTC
qemu-2.7.1-6.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.