Description of problem: During MS WHQL tests we are hitting assertion from the test in form of blue screen. The reason for the assertion is that the packets submitted by network layer are not returned (and under the hood the driver add packets to the ring, but we never get interrupt from QEMU to indicate that those packets were transmitted. At the moment of blue screen transmit ring is full). I also observed that when this happens, the qemu process is unkillable. The explanation for this is as follows: tap1 sends packets, tap2 does not consume them, as a result tap1 gets blocked forever, in particular it can not be closed. We get messages: unregister_netdevice: waiting for tap1 to become free in the log. This happens because tun/tap devices can hang on to skbs undefinitely. Version-Release number of selected component (if applicable): 2.6.18-194 How reproducible: always Steps to Reproduce: The problems is easiest to reproduce with 2 linux guests: 1. run 2 VMs on same host 2. ifdown on the one side, ping -b -s 1472 on the other, 3. you will lock out the second VM. Actual results: all traffic from second VM is blocked on host, kill -9 for pid of the second VM, process does not die. dmesg log shows: unregister_netdevice: waiting for tap1 to become free Expected results: traffic to other destinations should continue even if one destination is stuck. kill -9 on host should kill qemu and guest dmesg should be clean Additional info: yan, pls attach additional info as appropriate.
brew build with fix http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2376934 bug is reported fixed on this build
Brew build was tested by QE team with DTM 1.5 (the tool for running WHQL tests) on Windows 7, Windows 2008 and Windows 2008 R2. Blue screens as a result of the hanged transfer were not experienced during those tests.
in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
hi, Michael S. Tsirkin I try to reproduce this bug as following steps,but failed, could you help to check if there is somewhere I misunderstanding ? 1.Host: 2.6.18-194.el5 2.Host: ps -ef |grep qemu root 7681 4933 17 12:38 pts/7 00:02:05 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest1 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-32-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:01,vlan=0 -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :1 -monitor stdio root 7968 5006 13 12:45 pts/8 00:00:38 /usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name guest2 -no-kvm-pit-reinjection -rtc-td-hack -startdate now -drive file=/mnt/rhel5.5-64-virtio.qcow2,if=virtio,boot=on,cache=none -net nic,macaddr=00:00:12:31:4A:02,vlan=0,model=virtio -net tap,scprit=/etc/ifup,vlan=0 -usb -vnc :2 -monitor stdio 3.ifdown nic on the guest1 4.ping -b -s 1472 guest1_ip on the guest2 5.Host: kill -9 7968 (guest2) process die.
*** Bug 586829 has been marked as a duplicate of this bug. ***
Reproduce it with in kernel-2.6.18-194 according the steps from bug 584428#c11. Steps: 1. force arp in guest A to match guest B arp -i eth0 -s <ip for guest B> <mac for guest B> 2. ping guest B, we should get back packets e.g. with -c 1 3. ifdown guest B 4. ping guest B_ip -i 0.01 keep ping operator about 4 hours or more till finding guest A could receive packages from guest B. 5. kill -9 13498 (process of guest A,process does not die) ps -ef |grep qemu-kvm root 13498 4152 0 Sep10 pts/1 00:02:59 [qemu-kvm] <defunct> dmesg log shows: breth0: port 2(tap0) entering disabled state unregister_netdevice: waiting for tap0 to become free. Usage count = 1 unregister_netdevice: waiting for tap0 to become free. Usage count = 1 And it PASSED in kernel-2.6.18-209. Thanks~~
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html