Bug 218733

Summary: [FORCEDETH]: eth0 stops working after a while in Xen
Product: [Fedora] Fedora Reporter: Herbert Xu <herbert.xu>
Component: kernel-xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: bstein, mquinn, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-26 23:47:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Requested dmesg of bare metal kernel none

Description Herbert Xu 2006-12-07 02:42:21 UTC
+++ This bug was initially created as a clone of Bug #214717 +++

-- Additional comment from mquinn on 2006-12-06 19:28 EST --
Created an attachment (id=143006)
Dmesg


-- Additional comment from mquinn on 2006-12-06 19:31 EST --
I've attached my dmesg to this bug report because I believe I am experiencing
similar issues.

1.  After booting into FC6 Xen kernel, the network will function for 60 seconds
to 30 minutes, then hangs.
2.  Pinging from outside the box fails; pinging from the box to the outside fails.
3.  Restarting the network does not fix the issue.
4.  rmmod'ing the forcedeth NIC module and modprobe'ing it does not fix the issue
5.  Interrupts to appear to stop incrementing when the network hangs.
6.  Nothing but a reboot appears to bring the network back.

I'll attach some additional information that may be useful.

-- Additional comment from mquinn on 2006-12-06 19:32 EST --
Created an attachment (id=143007)
Output of lspci -vvv


-- Additional comment from mquinn on 2006-12-06 19:34 EST --
Created an attachment (id=143008)
Output of ifconfig


-- Additional comment from mquinn on 2006-12-06 19:35 EST --
Created an attachment (id=143009)
Output of lsmod


-- Additional comment from mquinn on 2006-12-06 19:35 EST --
Created an attachment (id=143010)
xend.log

Comment 1 Herbert Xu 2006-12-07 02:49:50 UTC
Are there any messages in dmesg (or xm dmesg) when this happens? Could you
please try disabling TX checksums on eth0 (with ethtool -K peth0 tx off) to see
if the problem still occurs? Thanks.

Comment 2 Herbert Xu 2006-12-08 02:31:37 UTC
Thanks for the xm dmesg output.  I presume the problem occurs even when you
don't run any domUs at all? If so please try disabling xend by making it not
start.  This will tell us whether it's to do with the changes made by xend
(e.g., setting a bogus MAC) or whether it's something lower down.

BTW, what does

ethtool -i peth0
ethtool -k peth0

show?


Comment 3 Micah Quinn 2006-12-08 21:39:53 UTC
Hello Herbert,

The output from "ethtool -i peth0":

driver: forcedeth
version: 0.56
firmware-version:
bus-info: 0000:00:05.0

and "ethtool -k pet0":

Offload parameters for peth0:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off

I booted without xend enabled this time and got roughly the same results.  The
network worked fine for approximately 46 minutes this time.  It was fast and
responsive and then it just stopped.  

Restarting the network did not help.  I manually set the IP address on the eth0
interface and then started examining the arp table.  There was one entry for my
DHCP Server/Default Gateway machine, but it did not have a HW address
(incomplete) specified.  I modified the arp table to have the correct ethernet
hardware address for my DHCP server/Default Gateway, but that did not seem to
help the situation.  Still no ping response.

What would you like me to try next?


Comment 4 Herbert Xu 2006-12-12 02:42:27 UTC
I presume this problem does not appear when you use a baremetal FC6 kernel? If
so please attach the dmesg from that kernel.  Thanks.

Comment 5 Micah Quinn 2006-12-12 05:57:08 UTC
Hell Herbert,

No the problem does not manifest itself on a bare metal kernel of the same version.

Comment 6 Herbert Xu 2006-12-13 04:21:52 UTC
Any chance of that dmesg from the working baremetal kernel? Thanks.

Comment 7 Micah Quinn 2006-12-15 04:35:12 UTC
Created attachment 143733 [details]
Requested dmesg of bare metal kernel

No problem.  Here is a dmesg from a stable, baremetal kernel (same version)
running on the same machine.

What would you like me to try next?

Comment 8 Herbert Xu 2006-12-15 06:09:21 UTC
If you have a spare partition I'd like you to check if this happens with a
32-bit (i.e., i686) HV+kernel.  Thanks.

Comment 9 Herbert Xu 2006-12-15 06:13:32 UTC
Another thing to try would be to load nvnet instead of forcedeth to see if it's
a driver-specific issue.

Comment 10 Herbert Xu 2006-12-15 06:21:41 UTC
If you can get the source code for nvnet :) So scratch that idea.

Comment 11 Herbert Xu 2006-12-15 06:34:59 UTC
Can you show me the ifconfig output *after* it locks up? When you ping from
inside the box to the outside after the lock-up, does the TX counter in the
ifconfig output increase? Thanks.

Comment 12 Red Hat Bugzilla 2007-07-25 01:36:51 UTC
change QA contact

Comment 13 Chris Lalancette 2008-02-26 23:47:07 UTC
This report targets FC6, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks