Bug 218733 - [FORCEDETH]: eth0 stops working after a while in Xen
[FORCEDETH]: eth0 stops working after a while in Xen
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel-xen (Show other bugs)
6
All Linux
medium Severity medium
: ---
: ---
Assigned To: Herbert Xu
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-06 21:42 EST by Herbert Xu
Modified: 2009-12-14 15:38 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-02-26 18:47:07 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Requested dmesg of bare metal kernel (17.92 KB, text/plain)
2006-12-14 23:35 EST, Micah Quinn
no flags Details

  None (edit)
Description Herbert Xu 2006-12-06 21:42:21 EST
+++ This bug was initially created as a clone of Bug #214717 +++

-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:28 EST --
Created an attachment (id=143006)
Dmesg


-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:31 EST --
I've attached my dmesg to this bug report because I believe I am experiencing
similar issues.

1.  After booting into FC6 Xen kernel, the network will function for 60 seconds
to 30 minutes, then hangs.
2.  Pinging from outside the box fails; pinging from the box to the outside fails.
3.  Restarting the network does not fix the issue.
4.  rmmod'ing the forcedeth NIC module and modprobe'ing it does not fix the issue
5.  Interrupts to appear to stop incrementing when the network hangs.
6.  Nothing but a reboot appears to bring the network back.

I'll attach some additional information that may be useful.

-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:32 EST --
Created an attachment (id=143007)
Output of lspci -vvv


-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:34 EST --
Created an attachment (id=143008)
Output of ifconfig


-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:35 EST --
Created an attachment (id=143009)
Output of lsmod


-- Additional comment from mquinn@quinnteam.com on 2006-12-06 19:35 EST --
Created an attachment (id=143010)
xend.log
Comment 1 Herbert Xu 2006-12-06 21:49:50 EST
Are there any messages in dmesg (or xm dmesg) when this happens? Could you
please try disabling TX checksums on eth0 (with ethtool -K peth0 tx off) to see
if the problem still occurs? Thanks.
Comment 2 Herbert Xu 2006-12-07 21:31:37 EST
Thanks for the xm dmesg output.  I presume the problem occurs even when you
don't run any domUs at all? If so please try disabling xend by making it not
start.  This will tell us whether it's to do with the changes made by xend
(e.g., setting a bogus MAC) or whether it's something lower down.

BTW, what does

ethtool -i peth0
ethtool -k peth0

show?
Comment 3 Micah Quinn 2006-12-08 16:39:53 EST
Hello Herbert,

The output from "ethtool -i peth0":

driver: forcedeth
version: 0.56
firmware-version:
bus-info: 0000:00:05.0

and "ethtool -k pet0":

Offload parameters for peth0:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off

I booted without xend enabled this time and got roughly the same results.  The
network worked fine for approximately 46 minutes this time.  It was fast and
responsive and then it just stopped.  

Restarting the network did not help.  I manually set the IP address on the eth0
interface and then started examining the arp table.  There was one entry for my
DHCP Server/Default Gateway machine, but it did not have a HW address
(incomplete) specified.  I modified the arp table to have the correct ethernet
hardware address for my DHCP server/Default Gateway, but that did not seem to
help the situation.  Still no ping response.

What would you like me to try next?
Comment 4 Herbert Xu 2006-12-11 21:42:27 EST
I presume this problem does not appear when you use a baremetal FC6 kernel? If
so please attach the dmesg from that kernel.  Thanks.
Comment 5 Micah Quinn 2006-12-12 00:57:08 EST
Hell Herbert,

No the problem does not manifest itself on a bare metal kernel of the same version.
Comment 6 Herbert Xu 2006-12-12 23:21:52 EST
Any chance of that dmesg from the working baremetal kernel? Thanks.
Comment 7 Micah Quinn 2006-12-14 23:35:12 EST
Created attachment 143733 [details]
Requested dmesg of bare metal kernel

No problem.  Here is a dmesg from a stable, baremetal kernel (same version)
running on the same machine.

What would you like me to try next?
Comment 8 Herbert Xu 2006-12-15 01:09:21 EST
If you have a spare partition I'd like you to check if this happens with a
32-bit (i.e., i686) HV+kernel.  Thanks.
Comment 9 Herbert Xu 2006-12-15 01:13:32 EST
Another thing to try would be to load nvnet instead of forcedeth to see if it's
a driver-specific issue.
Comment 10 Herbert Xu 2006-12-15 01:21:41 EST
If you can get the source code for nvnet :) So scratch that idea.
Comment 11 Herbert Xu 2006-12-15 01:34:59 EST
Can you show me the ifconfig output *after* it locks up? When you ping from
inside the box to the outside after the lock-up, does the TX counter in the
ifconfig output increase? Thanks.
Comment 12 Red Hat Bugzilla 2007-07-24 21:36:51 EDT
change QA contact
Comment 13 Chris Lalancette 2008-02-26 18:47:07 EST
This report targets FC6, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks

Note You need to log in before you can comment on or make changes to this bug.