Bug 124648 - tg3 not recovering from disconnects
tg3 not recovering from disconnects
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: David Miller
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-05-28 03:04 EDT by Richard Duran
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:25:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard Duran 2004-05-28 03:04:34 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)
Gecko/20040206 Firefox/0.8

Description of problem:
IBM xSeries 445 with two NetXtreme gig fiber cards using "tg3" driver
(with and without "bonding" driver). Networking becomes unstable when
connectivity restored (after some interruption). When bonded, the
switch will go "amber" on one of the NICs (seems to alternate). We can
get decent network connection if we change config from 2 bonded NICs
to 1 NIC (sometimes eth2, other times eth3, almost always
alternating...coincides with "amber")

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-15.EL

How reproducible:
Always

Steps to Reproduce:
1. Bond 2 NICs
2. Disconnect cable (switch goes amber)
3. Reconnect cable (switch stays amber)
    

Actual Results:  SSH connection becomes choppy

Expected Results:  Degradation should have been minimal/unnoticable

Additional info:
Comment 1 Richard Duran 2004-05-28 21:38:56 EDT
I've just finished looking over bonding.txt and I think I might see
the problem. I don't believe our switches support modes 0,2,3,4 and I
must be using the default (mode 0). I'd like to use "balance-alb"
(presumbably that would be the best for an Oracle database server),
but I don't know that the tg3 drivers supports this mode on our NIC
(IBM NetXtreme 1000 SX Fiber a/k/a BCM5703). How can I find out what
modes are supported, and what is a recommended value of miimon for
each mode?

-richard
Comment 2 Richard Duran 2004-05-30 14:23:21 EDT
Okay, I've tried the following in /etc/modules.conf:

  options bonding miimon=100 mode=balance-tlb
  options bonding miimon=100 mode=balance-alb
  options bonding miimon=100 mode=active-backup

and all have produced the same error condition. FYI, these cards are 
linked to a Nortel Passport 8010 which, according to our network guy, 
has no support for link aggregation (which presumably prevents me 
from using any mode but these three).

Out of curiosity, i rebooted the computer w/o bonding, so that I 
could have both switch lights stay green. I then changed to a bonding 
config:

/etc/sysconfig/network:GATEWAYDEV=bond0
/etc/sysconfig/network-scripts/ifcfg-bond0:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-eth2:MASTER=bond0,ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-eth3:MASTER=bond0,ONBOOT=yes

restarted networking and both switch lights stayed green (one went 
amber for an instant, but back to green)

I've had this condition before, but invariably, a network 
interruption would cause something to flake and my ssh connections 
would be choppy again. This is the first time I have it now that I'm 
trying "mode=balance-tlb". However, when I disconnect the cable and 
the switch light goes amber, it still stays amber when I reconnect. I 
ran an SSH session which started out fine, but suddenly disconnected. 
When I tried re-establishing, I couldn't even connect and a ping-to-
switch I started on the server before disconnecting has suddenly 
begun losing packets.

Help!
-richard
Comment 3 RHEL Product and Program Management 2007-10-19 15:25:15 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.