Red Hat Bugzilla – Bug 863753
virtio_serialport data loss when hot-unplugging and re-plugging the port (guest->host and host->guest)
Last modified: 2013-07-03 18:32:41 EDT
Description of problem:
I'm developing an interrupted loopback test. I managed to go around the problem with hot-plugging of the incorrectly uninitialized port ( https://bugzilla.redhat.com/show_bug.cgi?id=796048 ) but some data are lost between port replugs even thought the send/recv commands passed (I'm resending data in cas send fails).
I created simple reproducers in pyton, this one is for guest->host (data loss)
I'm sending data from guest to host, than I unplug the port, replug it back and continue in sending. Few of the successfully sent data are missing on the other side.
Version-Release number of selected component (if applicable):
10-20% (see the log for details)
Steps to Reproduce:
1) start sending data from guest (run sender.py on guest)
2) receive data on host (run listener.py on host)
3) unplug/replug the port (eg. MON=/tmp/monitor-hmp1-20121004-115412-sLj47KEF ; while :; do echo device_del vs1 | sudo socat $MON - ; sleep 5 ; echo 'device_add virtserialport,id=vs1,chardev=devvs1,nr=1,name=com.redhat.spice.0' | sudo socat $MON - ; sleep 5 ; done )
5) see the error messages
Error messages informing about how much data were successfully sent from guest, but were not received on host.
send/recv should report failure for all data which were not transferred. So no data loss should be visible using the simple reproducer.
Created attachment 622915 [details]
guest sender script (sends A, B, C, D..., resends when send fails)
Created attachment 622916 [details]
host receiver script (reads the port and verifies A, B, C, D, ... is received correctly. Reopens the port when read fails)
[5s between replug]
skipped: 3456789ABCDEFGHIJK (waiting for L)
skipped: EFGHIJKLMNOPQRSTUV (waiting for W)
skipped: STUVWXYZ0123456789 (waiting for A)
[1s between replug]
skipped: LMNOPQRSTUVWXYZ0123 (waiting for 4)
skipped: UVWXYZ0123456789ABCDEFGHIJKLMNOPQR (waiting for S)
skipped: YZ0123456789ABCDEFG (waiting for H)
skipped: CDEFGHIJKLMNOPQRSTUV (waiting for W)
skipped: OPQRSTUVWX (waiting for Y)
skipped: JKLMNOPQRSTUVWXY (waiting for Z)
skipped: STUVWXYZ01234567 (waiting for 8)
skipped: UVWXYZ0123456789 (waiting for A)
skipped: WXYZ0123456789ABCDEF (waiting for G)
skipped: 3456789ABCDEFGHIJKLMNOPQRSTUVW (waiting for X)
skipped: QRSTUVWXYZ01234 (waiting for 5)
skipped: NOPQRSTUVWXYZ012 (waiting for 3)
skipped: YZ0123456789ABCD (waiting for E)
skipped: GHIJKLMNOPQRSTUVWXYZ0123456789A (waiting for B)
skipped: HIJKLMNOPQRSTUVWXYZ (waiting for 0)
skipped: 789ABCDEFGHIJKLMN (waiting for O)
skipped: KLMNOPQRSTUVWX (waiting for Y)
skipped: JKLMNOPQRSTUVWXYZ (waiting for 0)
Not every reconnect fails:
5s sleep - 1 out of 10 failed
2s sleep - 3 out of 10 failed
1s sleep - 5 out of 10 failed
This output was generated using 1 char buffers. With longer buffers less of them are lost. With buffers over 10 characters long only 1 packet lose was observed (not always).
Sorry, I forgot to add qemu-cmdline. It was generated by autotest:
/usr/bin/qemu-kvm -S -name 'vm1' -nodefaults -chardev socket,id=hmp_id_hmp1,path=/tmp/monitor-hmp1-20121004-115412-sLj47KEF,server,nowait -mon chardev=hmp_id_hmp1,mode=readline -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20121004-115412-sLj47KEF,server,nowait -device isa-serial,chardev=serial_id_serial1 -device virtio-serial-pci,id=virtio_serial_pci0 -chardev socket,id=devvs1,path=/tmp/virtio_port-vs1-20121004-115412-sLj47KEF,server,nowait -device virtserialport,chardev=devvs1,name=com.redhat.spice.0,id=vs1,bus=virtio_serial_pci0.0 -chardev socket,id=devvs2,path=/tmp/virtio_port-vs2-20121004-115412-sLj47KEF,server,nowait -device virtserialport,chardev=devvs2,name=com.redhat.spice.1,id=vs2,bus=virtio_serial_pci0.0 -chardev socket,id=devvs3,path=/tmp/virtio_port-vs3-20121004-115412-sLj47KEF,server,nowait -device virtserialport,chardev=devvs3,name=com.redhat.spice.2,id=vs3,bus=virtio_serial_pci0.0 -chardev socket,id=devvs4,path=/tmp/virtio_port-vs4-20121004-115412-sLj47KEF,server,nowait -device virtserialport,chardev=devvs4,name=com.redhat.spice.3,id=vs4,bus=virtio_serial_pci0.0 -chardev socket,id=seabioslog_id_20121004-115412-sLj47KEF,path=/tmp/seabios-20121004-115412-sLj47KEF,server,nowait -device isa-debugcon,chardev=seabioslog_id_20121004-115412-sLj47KEF,iobase=0x402 -device ich9-usb-uhci1,id=usb1 -drive file='/tmp/kvm_autotest_root/images/f17-64.qcow2',index=0,if=ide,cache=none,snapshot=on -device virtio-net-pci,netdev=idbjSa34,mac='9a:13:14:15:16:17',id='idCjnNs4' -netdev tap,id=idbjSa34,fd=21 -m 512 -smp 1,cores=1,threads=1,sockets=1 -cpu 'Penryn' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -vga std -rtc base=utc,clock=host,driftfix=none -boot order=cdn,once=c,menu=off -enable-kvm
A real serial port also loses data when you unplug it.
Gal, please see Lukas' questions in Comment #6, basically is this expected behavior or a bug?
*** Bug 863754 has been marked as a duplicate of this bug. ***
Bug #863754, which I've duped to this, detailed similar issues for host->guest communication, see that bug for more info.
Summarizing private comment: This is expected behavior of a serial channel, if you need reliability you should do it at the application level.
Closing as NOTABUG, but anyone feel free to reopen if I'm mistaken.
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '17'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 17's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 17 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged change the
'version' to a later Fedora version prior to Fedora 17's end of life.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.