584412 – transmission stops when tap does not consume

Bug 584412 - transmission stops when tap does not consume

Summary: transmission stops when tap does not consume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.5.z
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	5.3.z
Assignee:	Michael S. Tsirkin
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	586829 (view as bug list)
Depends On:
Blocks:	Rhel5KvmTier1 584428 591842 643348 665293 665295 666367 672619
TreeView+	depends on / blocked

Reported:	2010-04-21 14:28 UTC by Michael S. Tsirkin
Modified:	2018-11-14 16:24 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	584428 665293 672619 (view as bug list)
Environment:
Last Closed:	2011-01-13 21:28:36 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description Michael S. Tsirkin 2010-04-21 14:28:50 UTC

Description of problem:

During MS WHQL tests we are hitting assertion from the test in form of blue
screen. The reason for the assertion is that the packets submitted by network
layer are not returned (and under the hood the driver add packets to the ring,
but we never get interrupt from QEMU to indicate that those packets were
transmitted. At the moment of blue screen transmit ring is full).


I also observed that when this happens, the qemu process
is unkillable.

The explanation for this is as follows:
tap1 sends packets, tap2 does not consume them, as a result
tap1 gets blocked forever, in particular it can not be closed.
We get messages:
unregister_netdevice: waiting for tap1 to become free
in the log.
This happens because tun/tap devices can hang on to skbs undefinitely.



Version-Release number of selected component (if applicable):
2.6.18-194

How reproducible:
always

Steps to Reproduce:
The problems is easiest to reproduce with 2 linux
guests:

1. run 2 VMs on same host
2. ifdown on the one side, ping -b -s 1472 on the other, 
3. you will lock out the second VM.

  
Actual results:

all traffic from second VM is blocked
on host, kill -9 for pid of the second VM,
   process does not die. 
dmesg log shows:
  unregister_netdevice: waiting for tap1 to become free

Expected results:

traffic to other destinations should continue even if one
destination is stuck.
kill -9 on host should kill qemu and guest

dmesg should be clean

Additional info:
yan, pls attach additional info as appropriate.

Comment 1 Michael S. Tsirkin 2010-04-21 14:35:57 UTC

brew  build with fix
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2376934
bug is reported fixed on this build

Comment 2 Yvugenfi@redhat.com 2010-04-21 14:42:13 UTC

Brew build was tested by QE team with DTM 1.5 (the tool for running WHQL tests) on Windows 7, Windows 2008 and Windows 2008 R2. 

Blue screens as a result of the hanged transfer were not experienced during those tests.

Comment 5 Jarod Wilson 2010-05-25 21:12:36 UTC

in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 6 Quan Wenli 2010-05-27 05:06:26 UTC

hi, Michael S. Tsirkin

I try to reproduce this bug as following steps,but failed, could you help to check if there is somewhere I misunderstanding ?

1.Host: 2.6.18-194.el5
2.Host:
ps -ef |grep qemu
root      7681  4933 17 12:38 pts/7    00:02:05 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest1 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-32-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:01,vlan=0 -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :1 -monitor stdio
root      7968  5006 13 12:45 pts/8    00:00:38 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest2 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-64-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:02,vlan=0,model=virtio -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :2 -monitor stdio
3.ifdown nic on the guest1
4.ping  -b -s 1472 guest1_ip on the guest2
5.Host: kill -9 7968 (guest2) process die.

Comment 7 Michael S. Tsirkin 2010-07-06 15:22:04 UTC

*** Bug 586829 has been marked as a duplicate of this bug. ***

Comment 9 Quan Wenli 2010-09-13 10:42:24 UTC

Reproduce it with in kernel-2.6.18-194 according the steps from bug 584428#c11.

Steps:

1. force arp in guest A to match guest B
arp -i eth0 -s <ip for guest B> <mac for guest B>
2. ping guest B, we should get back packets
e.g. with -c 1
3. ifdown guest B
4. ping guest B_ip -i 0.01 
keep ping operator about 4 hours or more till finding guest A could receive packages from guest B.
5. kill -9 13498 (process of guest A,process does not die)
ps -ef |grep qemu-kvm
root     13498  4152  0 Sep10 pts/1    00:02:59 [qemu-kvm] <defunct>


dmesg log shows:
breth0: port 2(tap0) entering disabled state
unregister_netdevice: waiting for tap0 to become free. Usage count = 1
unregister_netdevice: waiting for tap0 to become free. Usage count = 1

And it PASSED in kernel-2.6.18-209.
Thanks~~

Comment 24 errata-xmlrpc 2011-01-13 21:28:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.