Bug 584412 - transmission stops when tap does not consume
transmission stops when tap does not consume
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5.z
All Linux
high Severity medium
: rc
: 5.3.z
Assigned To: Michael S. Tsirkin
Virtualization Bugs
: ZStream
: 586829 (view as bug list)
Depends On:
Blocks: Rhel5KvmTier1 584428 591842 643348 665293 665295 666367 672619
  Show dependency treegraph
 
Reported: 2010-04-21 10:28 EDT by Michael S. Tsirkin
Modified: 2013-01-09 17:29 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 584428 665293 672619 (view as bug list)
Environment:
Last Closed: 2011-01-13 16:28:36 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michael S. Tsirkin 2010-04-21 10:28:50 EDT
Description of problem:

During MS WHQL tests we are hitting assertion from the test in form of blue
screen. The reason for the assertion is that the packets submitted by network
layer are not returned (and under the hood the driver add packets to the ring,
but we never get interrupt from QEMU to indicate that those packets were
transmitted. At the moment of blue screen transmit ring is full).


I also observed that when this happens, the qemu process
is unkillable.

The explanation for this is as follows:
tap1 sends packets, tap2 does not consume them, as a result
tap1 gets blocked forever, in particular it can not be closed.
We get messages:
unregister_netdevice: waiting for tap1 to become free
in the log.
This happens because tun/tap devices can hang on to skbs undefinitely.



Version-Release number of selected component (if applicable):
2.6.18-194

How reproducible:
always

Steps to Reproduce:
The problems is easiest to reproduce with 2 linux
guests:

1. run 2 VMs on same host
2. ifdown on the one side, ping -b -s 1472 on the other, 
3. you will lock out the second VM.

  
Actual results:

all traffic from second VM is blocked
on host, kill -9 for pid of the second VM,
   process does not die. 
dmesg log shows:
  unregister_netdevice: waiting for tap1 to become free

Expected results:

traffic to other destinations should continue even if one
destination is stuck.
kill -9 on host should kill qemu and guest

dmesg should be clean

Additional info:
yan, pls attach additional info as appropriate.
Comment 1 Michael S. Tsirkin 2010-04-21 10:35:57 EDT
brew  build with fix
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2376934
bug is reported fixed on this build
Comment 2 Yan Vugenfirer 2010-04-21 10:42:13 EDT
Brew build was tested by QE team with DTM 1.5 (the tool for running WHQL tests) on Windows 7, Windows 2008 and Windows 2008 R2. 

Blue screens as a result of the hanged transfer were not experienced during those tests.
Comment 5 Jarod Wilson 2010-05-25 17:12:36 EDT
in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 6 Quan Wenli 2010-05-27 01:06:26 EDT
hi, Michael S. Tsirkin

I try to reproduce this bug as following steps,but failed, could you help to check if there is somewhere I misunderstanding ?

1.Host: 2.6.18-194.el5
2.Host:
ps -ef |grep qemu
root      7681  4933 17 12:38 pts/7    00:02:05 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest1 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-32-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:01,vlan=0 -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :1 -monitor stdio
root      7968  5006 13 12:45 pts/8    00:00:38 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest2 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-64-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:02,vlan=0,model=virtio -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :2 -monitor stdio
3.ifdown nic on the guest1
4.ping  -b -s 1472 guest1_ip on the guest2
5.Host: kill -9 7968 (guest2) process die.
Comment 7 Michael S. Tsirkin 2010-07-06 11:22:04 EDT
*** Bug 586829 has been marked as a duplicate of this bug. ***
Comment 9 Quan Wenli 2010-09-13 06:42:24 EDT
Reproduce it with in kernel-2.6.18-194 according the steps from bug 584428#c11.

Steps:

1. force arp in guest A to match guest B
arp -i eth0 -s <ip for guest B> <mac for guest B>
2. ping guest B, we should get back packets
e.g. with -c 1
3. ifdown guest B
4. ping guest B_ip -i 0.01 
keep ping operator about 4 hours or more till finding guest A could receive packages from guest B.
5. kill -9 13498 (process of guest A,process does not die)
ps -ef |grep qemu-kvm
root     13498  4152  0 Sep10 pts/1    00:02:59 [qemu-kvm] <defunct>


dmesg log shows:
breth0: port 2(tap0) entering disabled state
unregister_netdevice: waiting for tap0 to become free. Usage count = 1
unregister_netdevice: waiting for tap0 to become free. Usage count = 1

And it PASSED in kernel-2.6.18-209.
Thanks~~
Comment 24 errata-xmlrpc 2011-01-13 16:28:36 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.