Created attachment 457036 [details] Dom3_eth1.cap Description of problem: Ethernet interface of domU suddenly stopped sending packets.It can receive packets though.Customer hasn't attempted to run 'service network restart' on domU after the issue occurred. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-194.el5 How reproducible: This happened just once and customer hasn't been able to reproduce this. Steps to Reproduce: 1. 2. 3. Actual results: Ethernet interface of domU can receive but not send packets. Expected results: Ethernet interface of domU should be able to send and receive packets. Additional info: dom0 has 3 domUs running on it.Each of the domUs has 2 Ethernet interfacese,eth0 and eth1.eth0 of all the domUs are connected to xenbr0 eth1 of all the domUs are connected to xenbr1. domU ---------- 1)This is the config file of the domU in question. vif = [ "mac=00:xx:yy:zz:xy:yz,bridge=xenbr0,script=vif-bridge,vifname=vif3.0", "mac=00:xx:yy:zz:yz:xy,bridge=xenbr1,script=vif-bridge" ] 2)#ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:16:36:65:C0:71 inet addr:192.168.50.3 Bcast:192.168.50.255 Mask:255.255.255.0 inet6 addr: fe80::216:36ff:fe65:c071/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:35431599 errors:0 dropped:0 overruns:0 frame:0 TX packets:27169681 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:400819490 (382.2 MiB) TX bytes:23027 (22.4 KiB) This shows no errors and it also shows that 'eth1' is up. 3)domU has no firewall enabled on it. 4)ethtool -i eth1 Cannot get driver information: Operation not supported dom0 --------- 1)#ethtool -i peth0 driver: bnx2 version: 2.0.2 firmware-version: 4.0.3 ipms 1.6.0 bus-info: 0000:04:00.0 #ethtool -i peth1 driver: bnx2 version: 2.0.2 firmware-version: 4.0.3 ipms 1.6.0 bus-info: 0000:06:00.0 Attached tcpdump output captured after pinging an external address(and not Dom0) from Dom3 : 1)tcpdump on Ethernet interface(peth1) on dom0 connected to xenbr1 2)tcpdump on the veth interface that is hooked to eth1 corresponding to dom3. 3)tcpdump on xenbr1 4)tcpdump on dom3's eth1
Created attachment 457037 [details] tcpdump output from dom0
Changing the first paragraph in 'Additional info' to: dom0 has 3 domUs running on it,dom1,dom2,dom3.Each of the domUs has 2 Ethernet interfaces,eth0 and eth1.eth0 of all the domUs are connected to xenbr0 eth1 of all the domUs are connected to xenbr1.This issue is seen only on eth1 of dom3.The other Ethernet interface on dom3,eth0 works fine.All the NICs on the other domUs work fine.
We would like to resolve all bug reports we get, and certainly with the highest priority customer reported bugs. However, to set the expectations here, it doesn't sound like we'll be able to resolve this one. The customer only saw it once, and it's not reproducing. Furthermore, the device that failed is really no different than other devices that didn't fail. Possibly there was a deadlock in the guest's kernel somewhere in the send path? It's hard to say, it's even hard to say if it's a virt related, or general kernel related problem, or even some other issue. Please pass these expectations back to the customer and ask that if it starts reoccurring regularly that they get a core dump during the next case. In the meantime I'll close this bug as INSUFFICIENT_DATA. It can be reopened in the event we get more. Thanks, Drew
Created attachment 459808 [details] sosreport from dom0
Created attachment 459810 [details] sosreport from domU
Discussed further with Miroslav Rezanina (mrezanin) who said he'd like to look at the dmesg output from both dom0 and domU.Attached sosreport from dom0 and domU and reopening the BZ.
I can't see anything interesting in the logs. Nothing interesting in dmesg, nothing interesting in the network stats. We can still try looking at a core of the guest to see if a kernel thread is stuck. So what the customer should do 1) assuming the nic is still in this state (i.e. can't transmit, can only receive), then from the host do 'xm dump-core' on the guest and send it to a dropbox. 2) "ifdown eth1 ; ifup eth1" and see if the problem goes away If the problem does go away, then see if it comes back eventually. Maybe write a script to test it periodically in order to find out how long before the problem comes back. If the problem doesn't go away, than (a) that would also be interesting information, and then (b) try rebooting the guest and seeing if the problem goes away. If so, write the script to log if/when the problem comes back, or again, that would be interesting information if it just doesn't work at all, even after reboots. Of course the customer should double check their networking in general, i.e. make sure there isn't a duplicate IP or something, since they're using static IPs.
*** Bug 468725 has been marked as a duplicate of this bug. ***
*** Bug 454285 has been marked as a duplicate of this bug. ***
*** Bug 213430 has been marked as a duplicate of this bug. ***
(In reply to comment #11) > The guest syslog says "netfront: device eth1 has flipping receive path" > multiple times (comment 10). I'm closing this as WONTFIX; flipping is broken for good. Last week we collected 12 bugs that are all caused by flipping; this is one of them. Please use copying instead.