Bug 648763 - Ethernet interface of domU suddenly stopped sending packets
Summary: Ethernet interface of domU suddenly stopped sending packets
Keywords:
Status: CLOSED DUPLICATE of bug 653501
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.5
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: 5.6
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 213430 454285 468725 (view as bug list)
Depends On:
Blocks: 514489
TreeView+ depends on / blocked
 
Reported: 2010-11-02 03:11 UTC by Nandini Chandra
Modified: 2018-11-14 18:49 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-31 09:51:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Dom3_eth1.cap (24 bytes, application/octet-stream)
2010-11-02 03:11 UTC, Nandini Chandra
no flags Details
tcpdump output from dom0 (646 bytes, application/zip)
2010-11-02 03:22 UTC, Nandini Chandra
no flags Details
sosreport from dom0 (1.80 MB, application/x-bzip2)
2010-11-11 17:48 UTC, Nandini Chandra
no flags Details
sosreport from domU (1.44 MB, application/octet-stream)
2010-11-11 17:58 UTC, Nandini Chandra
no flags Details

Description Nandini Chandra 2010-11-02 03:11:02 UTC
Created attachment 457036 [details]
Dom3_eth1.cap

Description of problem:

Ethernet interface of domU suddenly stopped sending packets.It can receive packets though.Customer hasn't attempted to run 'service network restart' on domU after the issue occurred.


Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-194.el5 


How reproducible:
This happened just once and customer hasn't been able to reproduce this.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Ethernet interface of domU can receive but not send packets.

Expected results:
Ethernet interface of domU should be able to send and receive packets.

Additional info:
dom0 has 3 domUs running on it.Each of the domUs has 2 Ethernet interfacese,eth0 and eth1.eth0 of all the domUs are connected to xenbr0
eth1 of all the domUs are connected to xenbr1.

domU
----------
1)This is the config file of the domU in question.
 vif = [ "mac=00:xx:yy:zz:xy:yz,bridge=xenbr0,script=vif-bridge,vifname=vif3.0", "mac=00:xx:yy:zz:yz:xy,bridge=xenbr1,script=vif-bridge" ]

2)#ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:16:36:65:C0:71
          inet addr:192.168.50.3  Bcast:192.168.50.255  Mask:255.255.255.0
          inet6 addr: fe80::216:36ff:fe65:c071/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:35431599 errors:0 dropped:0 overruns:0 frame:0
          TX packets:27169681 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:400819490 (382.2 MiB)  TX bytes:23027 (22.4 KiB)


This shows no errors and it also shows that 'eth1' is up.

3)domU has no firewall enabled on it.

4)ethtool -i eth1
Cannot get driver information: Operation not supported

dom0
---------
1)#ethtool -i peth0
driver: bnx2
version: 2.0.2
firmware-version: 4.0.3 ipms 1.6.0
bus-info: 0000:04:00.0

#ethtool -i peth1
driver: bnx2
version: 2.0.2
firmware-version: 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0

Attached tcpdump output captured after pinging an external address(and not Dom0) from Dom3 :
1)tcpdump on Ethernet interface(peth1) on dom0 connected to xenbr1
2)tcpdump on the veth interface that is hooked to eth1 corresponding to dom3.
3)tcpdump on xenbr1
4)tcpdump on dom3's eth1

Comment 1 Nandini Chandra 2010-11-02 03:22:26 UTC
Created attachment 457037 [details]
tcpdump output from dom0

Comment 2 Nandini Chandra 2010-11-02 15:44:32 UTC
Changing the first paragraph in 'Additional info' to:
dom0 has 3 domUs running on it,dom1,dom2,dom3.Each of the domUs has 2 Ethernet
interfaces,eth0 and eth1.eth0 of all the domUs are connected to xenbr0
eth1 of all the domUs are connected to xenbr1.This issue is seen only on eth1 of dom3.The other Ethernet interface on dom3,eth0 works fine.All the NICs on the other domUs work fine.

Comment 3 Andrew Jones 2010-11-03 09:59:43 UTC
We would like to resolve all bug reports we get, and certainly with the highest priority customer reported bugs. However, to set the expectations here, it doesn't sound like we'll be able to resolve this one. The customer only saw it once, and it's not reproducing. Furthermore, the device that failed is really no different than other devices that didn't fail. Possibly there was a deadlock in the guest's kernel somewhere in the send path? It's hard to say, it's even hard to say if it's a virt related, or general kernel related problem, or even some other issue. Please pass these expectations back to the customer and ask that if it starts reoccurring regularly that they get a core dump during the next case. In the meantime I'll close this bug as INSUFFICIENT_DATA. It can be reopened in the event we get more.

Thanks,
Drew

Comment 4 Nandini Chandra 2010-11-11 17:48:05 UTC
Created attachment 459808 [details]
sosreport from dom0

Comment 5 Nandini Chandra 2010-11-11 17:58:33 UTC
Created attachment 459810 [details]
sosreport from domU

Comment 6 Nandini Chandra 2010-11-11 18:02:57 UTC
Discussed further with  Miroslav Rezanina (mrezanin) who said he'd like to look at the dmesg output from both dom0 and domU.Attached sosreport from dom0 and domU and reopening the BZ.

Comment 7 Andrew Jones 2010-11-12 10:21:24 UTC
I can't see anything interesting in the logs. Nothing interesting in dmesg, nothing interesting in the network stats. We can still try looking at a core of the guest to see if a kernel thread is stuck. So what the customer should do

1) assuming the nic is still in this state (i.e. can't transmit, can only receive), then from the host do 'xm dump-core' on the guest and send it to a dropbox.

2) "ifdown eth1 ; ifup eth1" and see if the problem goes away

    If the problem does go away, then see if it comes back eventually. Maybe write a script to test it periodically in order to find out how long before the problem comes back.

    If the problem doesn't go away, than (a) that would also be interesting information, and then (b) try rebooting the guest and seeing if the problem goes away. If so, write the script to log if/when the problem comes back, or again, that would be interesting information if it just doesn't work at all, even after reboots.

Of course the customer should double check their networking in general, i.e. make sure there isn't a duplicate IP or something, since they're using static IPs.

Comment 12 Paolo Bonzini 2011-01-18 11:34:51 UTC
*** Bug 468725 has been marked as a duplicate of this bug. ***

Comment 13 Paolo Bonzini 2011-01-25 13:55:05 UTC
*** Bug 454285 has been marked as a duplicate of this bug. ***

Comment 14 Paolo Bonzini 2011-01-25 13:57:16 UTC
*** Bug 213430 has been marked as a duplicate of this bug. ***

Comment 16 Laszlo Ersek 2011-01-31 09:51:57 UTC
(In reply to comment #11)

> The guest syslog says "netfront: device eth1 has flipping receive path"
> multiple times (comment 10).

I'm closing this as WONTFIX; flipping is broken for good. Last week we collected 12 bugs that are all caused by flipping; this is one of them. Please use copying instead.


Note You need to log in before you can comment on or make changes to this bug.