Bug 216338 - broadcom b44 network interface flaps and restarts with moderate packet flow.
broadcom b44 network interface flaps and restarts with moderate packet flow.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Neil Horman
Brian Brock
:
Depends On:
Blocks: 221460
  Show dependency treegraph
 
Reported: 2006-11-19 09:31 EST by Frank DiPrete
Modified: 2008-02-29 14:24 EST (History)
3 users (show)

See Also:
Fixed In Version: RC
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-07 19:37:09 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
bug fix (4.37 KB, patch)
2006-12-17 13:33 EST, Michael Chan
no flags Details | Diff

  None (edit)
Description Frank DiPrete 2006-11-19 09:31:15 EST
Description of problem:
broadcom b44 network interface flaps and restarts during moderately large file
transfers or packet flow.


Version-Release number of selected component (if applicable):
fedra-c5
2.6.18-1.2239.fc5
filename:       /lib/modules/2.6.18-1.2239.fc5/kernel/drivers/net/b44.ko
author:         Florian Schirmer, Pekka Pietikainen, David S. Miller
description:    Broadcom 4400 10/100 PCI ethernet driver
license:        GPL
version:        1.01
version:        1.01
vermagic:       2.6.18-1.2239.fc5 mod_unload 686 REGPARM 4KSTACKS gcc-4.1
depends:        mii
alias:          pci:v000014E4d00004401sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004402sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000170Csv*sd*bc*sc*i*
srcversion:     BCFE8FBC66D7F7FB46C83A9
parm:           b44_debug:B44 bitmapped debugging message enable value (int)


How reproducible:
always

Steps to Reproduce:
1. load b44
2. start file/stream download
3.
  
Actual results:

interface shuts itself down, resets, start up again, continues to flap

b44: eth0: Link is down.
b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.
b44: eth0: Link is down.
b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.
b44: eth0: Link is down.
b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.


Expected results:
none of the above ;)

Additional info:
tried acpi=off, noacpi - no effect

Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000003ff (1023)
        Link detected: yes
Comment 1 Frank DiPrete 2006-11-20 07:38:17 EST
more info:

mechanism of failure is using anything on the pci bus at the same time causes
the problem, ie vidoe, sound, scsi card.

problem can be recreates playing vidwo with mpg over nfs or copying file from
lan to scsi attached storage with a pci adaptec card installed. mythtv fails the
same way (frontend stream over lan)
Comment 2 John W. Linville 2006-11-21 10:30:13 EST
Please attach the contents of /proc/interrupts.  I suspect you are overloading 
an interrupt line.
Comment 3 Neil Horman 2006-11-21 11:48:34 EST
an entire sysreport would actally be helpful.  thanks!
Comment 4 Frank DiPrete 2006-11-23 11:42:11 EST
emailed sysreport with b44 enabled
Comment 5 Frank DiPrete 2006-11-23 11:43:49 EST
           CPU0       
  0:    1068838    IO-APIC-edge  timer
  1:        151    IO-APIC-edge  i8042
  6:          3    IO-APIC-edge  floppy
  7:          0    IO-APIC-edge  parport0
  8:          1    IO-APIC-edge  rtc
  9:          1   IO-APIC-level  acpi
 12:       5626    IO-APIC-edge  i8042
 14:     281787    IO-APIC-edge  ide0
 15:      18293    IO-APIC-edge  ide1
169:      85127   IO-APIC-level  uhci_hcd:usb3, fglrx
177:          0   IO-APIC-level  uhci_hcd:usb1
185:          0   IO-APIC-level  uhci_hcd:usb2
193:          0   IO-APIC-level  ehci_hcd:usb4
201:       3490   IO-APIC-level  Intel 82801DB-ICH4, eth0
NMI:          0 
LOC:    1068769 
ERR:          0
MIS:          0
Comment 6 Frank DiPrete 2006-11-23 11:45:10 EST
interrupts using the e1000 instead of b44

           CPU0
  0:     287643    IO-APIC-edge  timer
  1:        244    IO-APIC-edge  i8042
  6:          3    IO-APIC-edge  floppy
  7:          0    IO-APIC-edge  parport0
  8:          1    IO-APIC-edge  rtc
  9:          1   IO-APIC-level  acpi
 12:      12417    IO-APIC-edge  i8042
 14:      13690    IO-APIC-edge  ide0
 15:       4307    IO-APIC-edge  ide1
169:      19116   IO-APIC-level  uhci_hcd:usb3, fglrx
177:       6931   IO-APIC-level  aic7xxx, uhci_hcd:usb1
185:          0   IO-APIC-level  uhci_hcd:usb2
193:       1962   IO-APIC-level  Intel 82801DB-ICH4, eth0
201:          0   IO-APIC-level  ehci_hcd:usb4
NMI:          0
LOC:     287536
ERR:          0
MIS:          0
Comment 7 Neil Horman 2006-11-28 11:23:38 EST
setting back to needinfo until sysreport is sent in.
Comment 8 Frank DiPrete 2006-11-29 20:25:44 EST
sent sysreport via email
Comment 9 Neil Horman 2006-11-30 11:53:29 EST
Sysreport looks pretty good, no other warnings, etc to indicate a problem, and
your interrupt counts seem low, so I don't think its an interrupt flood.

What kind of switch are you connected to?  can you check the link settings both
on the switch and on the b44 interface?  In my reading I've seen some reports
indicating that despite trying to set autonegotiation on via ethtool, the b44
driver continues to indicate that autonegotiation is off.  If that results in a
speed/duplex mismatch between the nic and the switch, that could account for the
flapping.  If you could post the output of /sbin/ethtool <iface>, and confirm
that the Duplex/Speed and autonegotiation settings on the NIC match those on
your switch, we should be able to tell if this is the problem we're looking for.
Comment 10 Frank DiPrete 2006-11-30 20:41:48 EST
connected to a linksys 100m switch
autoneg is on and up at 100m full duplex.
(as expected)
Comment 11 Neil Horman 2006-12-01 08:52:59 EST
and the output of ethtool <interface> on the linux box?
Comment 12 Frank DiPrete 2006-12-02 09:47:01 EST
[root@thurston ~]# ethtool eth2
Settings for eth2:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes
Comment 13 Frank DiPrete 2006-12-02 09:51:26 EST
and with the b44 plugged into a cisco 2924 instead of the linksys:

[root@thurston ~]# ethtool eth2
Settings for eth2:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes

still fails the same way

b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
Comment 14 Neil Horman 2006-12-04 16:15:38 EST
I think I may have found a patch that addresses this.  I've placed a test kernel
on my people page at:
http://people.redhat.com/nhorman
Please test it and report results back to me.  Thanks!
Comment 15 Frank DiPrete 2006-12-06 07:25:11 EST
can't install the rhel test kernel - running fc5

[root@thurston kernel-rpms]# uname -a
Linux thurston 2.6.18-1.2239.fc5 #1 Fri Nov 10 13:04:06 EST 2006 i686 i686 i386
GNU/Linux


[root@thurston kernel-rpms]# rpm -i kernel-smp-2.6.9-42.28.EL.bz216338.i686.rpm
error: Failed dependencies:
        kernel < 2.6.15 conflicts with hal-0.5.7.1-2.fc5.i386
        kernel < 2.6.12 conflicts with initscripts-8.31.6-1.i386
        kernel < 2.6.13 conflicts with kudzu-1.2.34.5-1.i386
Comment 16 Neil Horman 2006-12-06 09:11:14 EST
my bad, I wasn't thinking.  This is the patch I'm referring to is commit:
47b9c3b1e6afa3c40e3ac1822cd13946567b5955
From the upstream kernel tree, if you want to build your own kernel. I'll kick
off a kernel build here shortly for FC-5.  Sorry for the mistake.
Comment 17 Neil Horman 2006-12-06 09:47:46 EST
scratch that last remark.  It turns out the latest kernel for FC-5 already has
this patch.  although interestingly, this patch was added after the FC-5
release.  The earliest FC-5 kernel 2.6.15-1.2054.FC5 should not have this patch
included.  I wonder if this patch caused a regression of some sort. Could you
try booting with kernel 2.6.15-1.2054 to see if this problem was seen there as
well? The kernel is available here:
http://download.fedora.redhat.com/pub/fedora/linux/core/5/i386/os/Fedora/RPMS/
Thanks.
Comment 18 Frank DiPrete 2006-12-09 10:28:39 EST
tested using 2.6.15-1.2054

this was a bit difficult due to the bug that prevents loading proprietary
modules as the ystem has an ati fglrx card. (see error "print_tainted")

tested using the xorg radeon driver with same results.

[root@thurston ~]# uname -a
Linux thurston 2.6.15-1.2054_FC5 #1 Tue Mar 14 15:48:33 EST 2006 i686 i686 i386
GNU/Linux

root@thurston ~]# modinfo b44
filename:       /lib/modules/2.6.15-1.2054_FC5/kernel/drivers/net/b44.ko
author:         Florian Schirmer, Pekka Pietikainen, David S. Miller
description:    Broadcom 4400 10/100 PCI ethernet driver
license:        GPL
version:        0.97
version:        0.97
vermagic:       2.6.15-1.2054_FC5 686 REGPARM 4KSTACKS gcc-4.1
depends:        mii
alias:          pci:v000014E4d00004401sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004402sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000170Csv*sd*bc*sc*i*
srcversion:     45D308158F364EC0253ADEE
parm:           b44_debug:B44 bitmapped debugging message enable value (int)

[root@thurston ~]# cat /proc/interrupts
           CPU0
  0:     201603    IO-APIC-edge  timer
  1:       3007    IO-APIC-edge  i8042
  7:          0    IO-APIC-edge  parport0
  8:      51602    IO-APIC-edge  rtc
  9:          1   IO-APIC-level  acpi
 12:      13154    IO-APIC-edge  i8042
 14:      15386    IO-APIC-edge  ide0
 15:      14141    IO-APIC-edge  ide1
 16:      22247   IO-APIC-level  uhci_hcd:usb3, radeon@pci:0000:01:06.0
 17:      21883   IO-APIC-level  aic7xxx, uhci_hcd:usb1
 18:          0   IO-APIC-level  uhci_hcd:usb2
 19:          0   IO-APIC-level  ehci_hcd:usb4
 20:      51267   IO-APIC-level  Intel 82801DB-ICH4, eth0, eth2
NMI:          0
LOC:     201564
ERR:          0
MIS:          0

dmesg output:

b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is on for TX and on for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is on for TX and on for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is on for TX and on for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is on for TX and on for RX.
b44: eth2: Link is down.
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is on for TX and on for RX.

ethtool output:

[root@thurston ~]# ethtool eth2
Settings for eth2:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: No
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Current message level: 0x000000ff (255)
        Link detected: yes

Interesting not here - using the xorg vesa driver (ick) the problem does not
happen. There seems to be a correlation between utilizing the PCI bus and the
b44 kernel module flapping.
Comment 19 Neil Horman 2006-12-12 15:09:00 EST
Alright, I've asked broadcom to send me some appropriate hardware to
test/reproduce this with.  In the interim, this isn't possibly a broken hardware
issue, is it?  Cards driven by b44 are common enough that if this was a driver
problem for all b44 based hardware accross the board, I would expect that I
would see more reports.  Is it possible for you to swap your NIC out for another
b44 card to see if the problem persists.
Comment 20 Frank DiPrete 2006-12-12 19:18:31 EST
I detest having to say something like "it works fine on <other OS>"
but ...
It works fine on <other OS>.

other report from a quick goole search:
http://lkml.org/lkml/2003/12/29/90
http://www.uwsg.iu.edu/hypermail/linux/net/0401.2/0032.html
http://lists.shmoo.com/pipermail/hostap/2006-April/013118.html

I think it's been out there for a while.

I don't have another b44 card, nor would I go buy one at this point ;)
Comment 21 Neil Horman 2006-12-12 19:55:52 EST
Ok, :)
Can you tell me the specific card you are using.  I'd like to match your
hardware as closely as possible when I try to recreate.  I'm currently assuming
that its a bcm4401, like the other notes indicate, but I'd like to be sure.
Comment 22 Frank DiPrete 2006-12-13 06:43:44 EST
it's a chip on an intel motherboard shipped with a dell dimension 2400
Comment 23 Neil Horman 2006-12-13 09:39:52 EST
Ok, I'll try to confirm that, but for now I'm going to continue assuming its a 4401.
Comment 24 Neil Horman 2006-12-13 09:41:34 EST
actually, run an lspci -n and send in that output, I can use that to confirm.
Comment 25 Frank DiPrete 2006-12-16 10:11:19 EST
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)

01:09.0 0200: 14e4:4401 (rev 01)
Comment 26 Michael Chan 2006-12-17 13:33:30 EST
Created attachment 143874 [details]
bug fix

The same bug was reported on the kernel bugzilla:

http://bugzilla.kernel.org/show_bug.cgi?id=7696

I've attached a compile-only tested patch that should fix the problem.	Please
give it a try.	Thanks.
Comment 27 Neil Horman 2006-12-18 09:27:25 EST
Frank, can you give Michaels patch a try?  I've got the card here, but I'm using
it to reproduce another problem at the moment.  If you could test the patch out,
it would be a bit help.  Thanks!
Comment 28 Frank DiPrete 2006-12-31 10:38:29 EST
Looks good!

I axtracted the kernel source and re-compiled with a patched b44.c file:

[root@thurston ~]# modinfo b44
filename:       /lib/modules/2.6.18-1.2239.fc5/kernel/drivers/net/b44.ko
author:         Florian Schirmer, Pekka Pietikainen, David S. Miller
description:    Broadcom 4400 10/100 PCI ethernet driver
license:        GPL
version:        1.01
version:        1.01
vermagic:       2.6.18-prep mod_unload 686 REGPARM 4KSTACKS gcc-4.1
depends:        mii
alias:          pci:v000014E4d00004401sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004402sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000170Csv*sd*bc*sc*i*
srcversion:     0A98DE945E57B0D08C7B8D4
parm:           b44_debug:B44 bitmapped debugging message enable value (int)

The vermagic confirms the patched modulke is loaded (I didn't change the extra
version string in the kernel Makefile on purpose)

With the new module:

DMESG:
ADDRCONF(NETDEV_UP): eth2: link is not ready
b44: eth2: Link is up at 100 Mbps, full duplex.
b44: eth2: Flow control is off for TX and off for RX.
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

ETHTOOL:

Settings for eth2:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes

The patch fixed the problem. Very nice.
Comment 29 Neil Horman 2007-01-01 14:08:59 EST
Ok, this will get picked up in the FC kernels as it makes its way upstream then.
 I'm going to change then release on this bug to be RHEL5 to make sure we get it
in place there.  Thanks!
Comment 31 RHEL Product and Program Management 2007-01-02 10:40:49 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 33 Jay Turner 2007-01-02 11:15:38 EST
QE ack for RHEL5.
Comment 36 Jay Turner 2007-01-10 10:50:53 EST
Built into 2.6.18-1.3002.el5.
Comment 37 RHEL Product and Program Management 2007-02-07 19:37:10 EST
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.