Bug 154512 - b44 driver constantly restarting when using the network
b44 driver constantly restarting when using the network
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-12 08:45 EDT by Mary Ellen Foster
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-07 15:07:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Result of running "sysreport" on my computer (324.21 KB, application/x-bzip)
2005-06-07 14:39 EDT, Mary Ellen Foster
no flags Details

  None (edit)
Description Mary Ellen Foster 2005-04-12 08:45:06 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
[ NB: this bug may be related to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151898 . Certainly it's also about the b44 driver on a Dell laptop. ]

Using my broadband router at home (which assigns an address through DHCP, etc), networking works fine fine. However, if I plug my machine into the network at work (also using DHCP, but I guess something must be different), everything still comes up fine, but as soon as I try to do anything on the network (e.g., "yum check-update") I get an unending sequence of the following in /var/log/messages:

Apr 12 13:25:50 pcmef kernel: b44: eth0: Link is down.
Apr 12 13:25:53 pcmef kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 13:25:53 pcmef kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 13:25:58 pcmef kernel: b44: eth0: Link is down.
Apr 12 13:26:01 pcmef kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 13:26:01 pcmef kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 13:26:07 pcmef kernel: b44: eth0: Link is down.
Apr 12 13:26:09 pcmef kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 13:26:09 pcmef kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 13:26:15 pcmef kernel: b44: eth0: Link is down.
Apr 12 13:26:18 pcmef kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 13:26:18 pcmef kernel: b44: eth0: Flow control is off for TX and off for RX.

As soon as I stop trying to use the network (e.g., kill the "yum" process I started), the log messages stop. This makes networking totally unusable for me, though.

With Test 1, I seem remember being able to resolve this by booting with "acpi=off", but I didn't run much at work so I can't be sure. With Test 2, though, it seems to happen regardless of that option. This is with the 2.6.11-1.1234_FC4smp kernel, but it also happens with the non-SMP kernel.

This is on a Dell Inspiron 5160; here's the output of lspci:
00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.1 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.3 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV34M [GeForce FX Go 5200] (rev a1)
02:01.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
02:02.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03)
02:04.0 CardBus bridge: Texas Instruments PCI4510 PC card Cardbus Controller (rev 02)
02:04.1 FireWire (IEEE 1394): Texas Instruments PCI4510 IEEE-1394 Controller

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1234_FC4

How reproducible:
Always

Steps to Reproduce:
1. Boot the computer, make sure eth0 is started
2. tail -f /var/log/messages
3. yum check-update
  

Additional info:
Comment 1 Mary Ellen Foster 2005-04-12 09:08:47 EDT
Possibly relevant facts: the log messages when starting at home (which works)
and at school (which doesn't) are different. At home, the only b44-related
messages I can see in /var/log/messages are the following. I suspect the "NETDEV
WATCHDOG" part is a symptom of the problem.

Apr 11 21:53:43 floopy kernel: b44.c:v0.95 (Aug 3, 2004)
Apr 11 21:53:43 floopy kernel: ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 17
(level, low) -> IRQ 177
Apr 11 21:53:43 floopy kernel: eth0: Broadcom 4400 10/100BaseT Ethernet
00:11:43:67:8a:09
[...]
Apr 11 21:53:43 floopy kernel: b44: eth0: Link is down.
Apr 11 21:53:43 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 11 21:53:43 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 11 21:53:44 floopy kernel: i2c /dev entries driver
[... and then everything works fine.]


At school, the messages look like this:
Apr 12 11:25:29 floopy kernel: b44.c:v0.95 (Aug 3, 2004)
Apr 12 11:25:29 floopy kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 17
(level, low) -> IRQ 177
Apr 12 11:25:29 floopy kernel: eth0: Broadcom 4400 10/100BaseT Ethernet
00:11:43:67:8a:09
[...]
Apr 12 11:25:30 floopy kernel: b44: eth0: Link is down.
Apr 12 11:25:30 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:25:30 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:25:30 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:25:30 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:25:31 floopy kernel: i2c /dev entries driver
Apr 12 11:25:35 floopy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 12 11:25:35 floopy kernel: b44: eth0: transmit timed out, resetting
Apr 12 11:25:35 floopy kernel: b44: eth0: Link is down.
Apr 12 11:25:38 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:25:38 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
[...]
Apr 12 11:27:44 floopy kernel: b44: eth0: Link is down.
Apr 12 11:27:47 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:27:47 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:27:51 floopy kernel: b44: eth0: Link is down.
Apr 12 11:27:54 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:27:54 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:27:58 floopy kernel: b44: eth0: Link is down.
Apr 12 11:28:01 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:28:01 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:28:05 floopy kernel: b44: eth0: Link is down.
Apr 12 11:28:08 floopy kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Apr 12 11:28:08 floopy kernel: b44: eth0: Flow control is off for TX and off for RX.
Apr 12 11:28:13 floopy kernel: b44: eth0: Link is down.
[ ... and so on. ]
Comment 2 Mary Ellen Foster 2005-04-22 11:00:44 EDT
Is there any further information I can give to help debug this? It continues to
happen with the 1253 kernel, and it's REALLY annoying ...

Is there anywhere I can still get the default FC4T1 kernel from, so I can
confirm my recollection that booting with "acpi=off" eliminated this issue with
that kernel? If that's true, the changelog from there to now might point at
where the problem is coming from ...
Comment 3 Sergey V. Udaltsov 2005-04-27 21:59:05 EDT
Same error messages when I connect to my broadband router at home. With the
latest kernel for FC3 actually, not FC4T
Comment 4 John W. Linville 2005-04-28 09:18:30 EDT
I have test kernels w/ a minor update to the b44 driver here:

   http://people.redhat.com/linville/kernels/fc3/

Please give them a try and post your results here.  Thanks!
Comment 5 Mary Ellen Foster 2005-04-28 16:14:53 EDT
Those kernels seem to require a "kernel-utils" package -- where should that come
from?
Comment 6 Mary Ellen Foster 2005-04-28 16:30:54 EDT
Okay, "kernel-utils" is an FC3 package and I'm running FC4. I think I've managed
to get the same effect via:
yum install smartmontools microcode_ctl cpuspeed readahead \
    longrun irqbalance x86info rng-utils

Had to install the kernel with "--nodeps --oldpackage" too, of course; I'll test
it tomorrow.
Comment 7 John W. Linville 2005-04-28 16:34:07 EDT
Hmmm...maybe I need to start building FC4 test kernels too... :-)

Thanks for your efforts.  Let me know if you can't get that FC3 kernel to work
(other than the previous b44 problems), and I'll do an FC4 kernel.
Comment 8 Dave Jones 2005-04-28 20:57:23 EDT
the kernel-utils dependancy got changed to the hardlink package.
As long as you have that installed, you should be safe to --nodeps install it
Comment 9 Mary Ellen Foster 2005-04-29 12:02:25 EDT
Okay, I've been doing some experimentation with kernels and Grub command-line
parameters. I removed all my third-party kernel modules (nVidia, ndiswrapper,
ntfs), just in case, although I doubt that would have had any effect.

Here are the results; note that my machine has a "hyperthreaded" processor, so I
tested both the UP and SMP version of each kernel (and saw no difference between
them in any case).

2.6.11-1.1177_FC4 (initial FC4T1 kernel; found on planetmirror.com)
- Bug present when booted normally
- Bug *ABSENT* when booted with "acpi=off" appended to Grub cmd line

2.6.11-1.1275_FC4 (current Rawhide kernel)
- Bug present when booted normally
- Bug present when booted with "acpi=off"

2.6.11-1.19_FC3.jwltest.7 (John Linville's test FC3 kernel with b44 patch)
- Bug present when booted normally
- Bug *ABSENT* when booted with "acpi=off"

Hopefully this info helps in tracking down what the problem is. It would be nice
if it weren't necessary to use "acpi=off" in the first place, of course. :)
Comment 10 Mary Ellen Foster 2005-05-13 05:43:04 EDT
This bug still happens with 1286_FC4 (the FC4T3 kernel). Does the fact that
"acpi=off" makes it work with 1177 but nothing higher help to track down where
the problem is likely to be?
Comment 11 John W. Linville 2005-05-13 10:35:08 EDT
It may be useful, but so far it hasn't enlightened anything for me... :-( 
 
Have you tried testing with "noapic" either by itself or w/ "acpi=off" as 
well?  Those two seem to commonly go together, and success with one or both of 
them usually indicates a flaky BIOS -- which begs the question of have you 
looked for a BIOS update for your motherboard? 
 
There are only very minor differences between the b44 driver in my FC3 test 
kernels (which still work w/ "acpi=off" and the b44 driver in the current 
rawhide (i.e. FC4testX): 
 
--- jwltest-fc3-9/kernel/kernel-2.6.11/linux-2.6.11/drivers/net/b44.c   
2005-05-05 16:28:28.000000000 -0400 
+++ kernel-rawhide-today/kernel/kernel-2.6.11/linux-2.6.11/drivers/net/b44.c   
2005-05-13 10:30:44.977495537 -0400 
@@ -1910,7 +1910,7 @@ static void __devexit b44_remove_one(str 
        } 
 } 
 
-static int b44_suspend(struct pci_dev *pdev, u32 state) 
+static int b44_suspend(struct pci_dev *pdev, pm_message_t state) 
 { 
        struct net_device *dev = pci_get_drvdata(pdev); 
        struct b44 *bp = netdev_priv(dev); 
 
So, this doesn't look like it is related to the b44 driver per se. 
 
Please do tests w/ "noapic" and post the results here.  Please also 
investigate the possibility of a BIOS upgrade for your motherboard.  Thanks! 
Comment 12 Mary Ellen Foster 2005-05-13 11:49:04 EDT
Did some more testing, as requested ... it's kind of confusing, but there seem
to be configurations that work and ones that don't, so I can live with that. I
checked, and according to the Dell site I'm already running the newest BIOS for
my machine.

With hyperthreading disabled in the BIOS and running kernel 2.6.11-1.1290 (non-SMP):
- Works only if I add "acpi=off" to the grub line
- "noapic" doesn't seem to make a difference either way

With hyperthreading enabled in the BIOS:
- The 2.6.11-1.1290 UP kernel works with the default grub line (?!?!)
- The 2.6.11-1.1290 SMP kernel doesn't work, even with "acpi=off" and/or "noapic"

And, of course, everything works fine when I plug into my router at home; this
is all only with the network at work.

Very, very weird.
Comment 13 John W. Linville 2005-05-13 14:04:39 EDT
Mary, I got a couple of notes from someone that seemed to be having similar
issues to what you are seeing.  He advises using "acpi=noirq" rather than
"acpi=off".  Would you mind giving that a try as well?  Thanks!
Comment 14 Mary Ellen Foster 2005-05-13 17:24:50 EDT
"acpi=noirq" doesn't change any of my above results; sorry. (I'm now testing
with the 1303 kernel, but the results are the same.)
Comment 15 John W. Linville 2005-06-06 14:33:03 EDT
Could you attach the output of running "sysreport"?  Thanks! 
Comment 16 Mary Ellen Foster 2005-06-07 14:37:17 EDT
I'm now testing with kernel 1369 -- I haven't actually had the laptop downtown
since I last updated this bug (13 May), so I haven't been able to check this
recently. But as far as I can tell, networking is now happy even with the SMP
kernel and hyperthreading, without any need to add any command-line arguments.
Unfortunately, I don't think I'll be able to track the changes backwards to see
when it got fixed ...

I'll attach the result of running "sysreport" regardless (I ctrl-C'd the RPM
query because it was taking forever).

Should I set this bug WORKSFORME now?
Comment 17 Mary Ellen Foster 2005-06-07 14:39:16 EDT
Created attachment 115196 [details]
Result of running "sysreport" on my computer
Comment 18 John W. Linville 2005-06-07 15:07:40 EDT
Is testing only in your current location sufficient to pronounce the problem 
solved?  If you are comfortable closing, that's fine by me... 
 
I'll go ahead and close it as CURRENTRELEASE.  Feel free to reopen if the 
problem returns.  Thanks! 
Comment 19 Mary Ellen Foster 2005-06-08 09:42:22 EDT
p.s. -- I suspect that this may have been different symptoms of the same
underlying problem as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156261 -- certainly, it's
since the TPM driver (whatever that is?) was disabled in the kernel that my
issue has also gone away.
Comment 20 John W. Linville 2005-06-08 10:12:52 EDT
I suspected the same thing...maybe you should send me your resume... :-)  
  
Seriously, if you'd like to test w/ the kernels from bug 156261 comment 7 and 
let me know the results, that would be great...thanks! 

Note You need to log in before you can comment on or make changes to this bug.