Bug 184440 - kernel / bcm43xx lockups
kernel / bcm43xx lockups
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
NeedsRetesting
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-03-08 15:15 EST by Bernard Johnson
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-08-07 14:44:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
15 second wireshark capture showing the duplicate packets (587 bytes, application/octet-stream)
2006-07-17 13:21 EDT, Bernard Johnson
no flags Details

  None (edit)
Description Bernard Johnson 2006-03-08 15:15:33 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060306 Fedora/1.5.0.1-7 Firefox/1.5.0.1

Description of problem:
I saw this pretty regularly when the bcm43xx code was first introduced, but not since weeks ago.  There are not relevent log messages.  The only thing I notice is the weird ping results that occur on/about the time of the lockup:

[root@localhost ~]# ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=12177 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=12591 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=14069 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=14078 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=14097 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=13110 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=13385 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=12622 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=11627 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=12473 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=13416 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=14376 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=5 ttl=64 time=14101 ms
64 bytes from 192.168.1.1: icmp_seq=5 ttl=64 time=14186 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=13296 ms
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=14233 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=15701 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=16581 ms (DUP!)

I do not know that this is specifically related to the bcm43xx driver, but I'm guessing that it is.


Version-Release number of selected component (if applicable):
kernel-2.6.15-1.2025_FC5

How reproducible:
Sometimes

Steps to Reproduce:
1. Bring up network using bcm43xx
2. ping gateway
3.
  

Actual Results:  Many duplicate packets received - eventually (usually within a minute or two) the system locks up.

Expected Results:  No lockups or duplicate packets.

Additional info:
Comment 1 Bernard Johnson 2006-03-08 15:36:16 EST
I see that in kernel-2.6.15-1.2032_FC5, the bcm43xx has been disabled from
automatically coming up
(http://cvs.fedora.redhat.com/viewcvs/rpms/kernel/devel/linux-2.6-bcm43xx-neuter.patch?rev=1.1&view=auto).

I just loaded kernel-2.6.15-1.2032_FC5 and I do see some duplicates:
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=1.73 ms
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=3.36 ms (DUP!)
64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=1.68 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=64 time=1.74 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=1.78 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=3.01 ms (DUP!)

but I've not yet (in 5 minutes testing) seen lockup.  When I booted up 2025 this
morning, I managed to get three lockups in about ten minutes time.
Comment 2 Bernard Johnson 2006-03-09 04:24:01 EST
I was able to lock up kernel-2.6.15-1.2032_FC5 today as well.  I ran 'ping -f
gateway' and let it sit for awhile.  Unfortunately, the screensaver kicked in so
I couldn't see how long it took to lock up.

It looks to me like the bcm43xx driver will not be supported in FC5 since the
PCI ID was dropped.  Should I just wait until further upstream bcm43xx patches
arrive?  Or maybe try http://people.redhat.com/linville/kernels/fedora-netdev/ ?
Comment 3 Bernard Johnson 2006-03-10 15:57:39 EST
When I had some time today, I installed ndiswrapper and brought up the
networking using ndiswrapper and a Windows NDIS driver for my Broadcom chip. 
It's the same driver that I used fwcutter on to get native bcm43xx support.

Under ndiswrapper, I get no lockups or dupe packets whatsoever.  It seems that
the problem is buried in either the SoftMAC code or the bcm43xx code.
Comment 4 John W. Linville 2006-07-11 10:27:29 EDT
There have been a lot of bcm43xx/softmac updates in the last 4 months.  Can 
you verify that this is still a problem with current FC5 or rawhide kernels?
Comment 5 Bernard Johnson 2006-07-12 01:58:46 EDT
Althought I did have one lockup yesterday during a network scan, I can't
guarantee that it's related to this bug.  It's also the first lockup I've had in
months now.  Also, the weird symptoms originally reported (DUP packets) are no
longer reproducible with a fully updated rawhide system, so I believe this bug
to be dead.  Closing. 
Comment 6 Bernard Johnson 2006-07-16 21:11:24 EDT
I was able to reproduce this today, so I'm reopening this bug.
Comment 7 Bernard Johnson 2006-07-17 13:21:21 EDT
Created attachment 132559 [details]
15 second wireshark capture showing the duplicate packets

This 15 second wireshark capture was performed while I was pinging the gateway
at my current location.
Comment 8 Andrig Miller 2006-08-04 17:18:30 EDT
I have also had lockups trying to use the bcm43xx driver.  I am on an x86_64
system though, and I have 1.2GB of RAM.  I read elsewhere were there was a bug
in the driver when the system had more that 1GB of RAM.  Don't know if that is
related.  Also, I have three different drivers for my Broadcom card.  My
broadcom card is reported by the bcm43xx driver as being a 0x4306 rev 0x3.  I
have a Windows XP x64 driver version 3.70.17.5 and a 3.100.64.0, and if I load
the firmware for the 3.70.17.5 after booting (putting the files from fwcutter
into /lib/firmware), the wireless interface comes up and associates with the
access point, although the wpa_supplicant doesn't work (different problem).  If
I reboot with that firmware still in /lib/firmware, the OS hangs every time on
boot as soon as the ifplugd hits that interface.  If I upload the 3.100.64.0
version of the firmware, the wireless light comes on, and the OS immediately
hangs.  I have an even newer driver, but fwcutter doesn't support it, so I
cannot try it.  In both cases, at boot time, the OS always hangs as soon as it
touches the wireless interface.  The only way I can boot my system, is to boot
off the rescue CD and remove the firmware files from /lib/firmware.

I have the latest kernel: 2.6.17-1.2157_FC5, and all patches applied.
Comment 9 Ivo Sarak 2006-09-18 04:39:07 EDT
I get the same symptoms (duplicate pings) and behaviour (system lockup) on FC6test3.
Comment 10 P Wolf 2006-10-07 20:15:31 EDT
The system froze (did not respond to Ctl-Alt-F1, Ctl-Alt-Backspace) 1-10 minutes
after booting as long as I kept the modprobe.bcm43xx file in /etc/modprobe.d. 
It didn't matter whether I even ifup'd the wireless.

[root]# grep MemTotal /proc/meminfo
MemTotal:      1035368 kB

[root]# uname -srvmpi
Linux 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006 i686 i686 i386

[root]# lspci | grep Broadcom
02:03.0 Network controller: Broadcom Corporation BCM4303 802.11b Wireless LAN
Controller (rev 01)

[root]# lspci -n | grep `lspci | grep Broadcom | cut -d \  -f 1`
02:03.0 0280: 14e4:4301 (rev 01)

[root]# grep "model name" /proc/cpuinfo
model name      : Mobile Intel(R) Pentium(R) 4 - M CPU 2.00GHz
Comment 11 P Wolf 2006-10-08 12:40:33 EDT
My comment #10 was premature.  The unscheduled freezes resumed and continued
until I changed the video mode.  Probably the networking was just a red herring.
Comment 12 John W. Linville 2006-10-10 08:12:16 EDT
2.6.18-based FC5 test kernels are available here:

   http://people.redhat.com/linville/kernels/fc5/

These include a post-2.6.18 patch intended to eliminate lock-ups, as well as a 
all the bcm43xx fixes from 2.6.18.  Please give them a try.
Comment 13 Orion Poplawski 2006-10-12 16:45:28 EDT
Well, just a single reboot so far, but 2.6.18-1.2195.2.1.fc5.jwltest.17 seems to
be working.
Comment 14 Orion Poplawski 2006-10-13 15:47:54 EDT
System hung after being up for 20 hours with the test kernel.  No idea what
caused the hang though. so it may not be the same issue.  Wireless interface was
not configured.
Comment 15 Dave Jones 2006-10-16 15:14:04 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 16 Bernard Johnson 2006-11-02 13:02:04 EST
Still getting occasional DUP packets as of kernel-2.6.18-1.2798.fc6.  I have not
seen a recent lockup on my system, but they were rare to start with.
Comment 17 Bernard Johnson 2006-11-02 13:03:09 EST
Removing NEEDINFO
Comment 18 clifford snow 2006-11-14 12:32:36 EST
I've been having the same problem  The system always locks up after the last
message below.  bcm43xx: Controller RESET (TX timeout) ...

This is using kennel 2.6.18-1.2831.2.1.fc6.jwltest.12 

removing the bcm43xx module eliminates the lockups.

Is this problem related to the BADNESS limit?  I've read elsewhere that changing
the BADNESS limit to 20 fixes the problem.  I tried changing the value but was
unable to compile the kernel not related to this.

Nov 12 17:44:17 dell kernel: SoftMAC: Start scanning with channel: 1
Nov 12 17:44:17 dell kernel: SoftMAC: Scanning 14 channels
Nov 12 17:44:18 dell kernel: SoftMAC: Scanning finished
Nov 12 17:46:18 dell kernel: SoftMAC: Start scanning with channel: 1
Nov 12 17:46:18 dell kernel: SoftMAC: Scanning 14 channels
Nov 12 17:46:18 dell kernel: NETDEV WATCHDOG: eth1: transmit timed out
Nov 12 17:46:18 dell kernel: bcm43xx: Controller RESET (TX timeout) ...
Nov 12 17:46:18 dell kernel: bcm43xx: select_wireless_core: cleanup
Nov 12 17:46:28 dell kernel: NETDEV WATCHDOG: eth1: transmit timed out
Nov 12 17:46:28 dell kernel: bcm43xx: Controller RESET (TX timeout) ...
Comment 19 John W. Linville 2006-11-14 13:03:01 EST
Clifford, have you tried the fc6.netdev kernels?

   http://people.redhat.com/linville/kernels/fedora-netdev/

Do they work any better for you?
Comment 20 clifford snow 2006-11-14 21:50:31 EST
I've been a regular user of the netdev kernels.  I was using your last kernel
but decided to try the jwltest.12 to see if it had any impact on the bcm43xx
lock problem. 

Clifford
Comment 21 John W. Linville 2006-12-12 09:52:45 EST
Current FC6.netdev kernels include the new d80211 stack.  Please give them a 
try, and be sure to change /etc/modprobe.conf to refer to bcm43xx-d80211 
instead of bcm43xx.  You will probably have to use v4 firmware as well.

Does this work any better for you?
Comment 22 John W. Linville 2007-02-13 10:36:34 EST
Closed due to lack of response.  Please reopen when the requested information 
becomes available...thanks!
Comment 23 Bernard Johnson 2007-02-14 05:41:34 EST
John-

Sorry for the lack of response.  I haven't not been using this particular
machine very much so it has been difficult to report any findings.  I did
however travel with it this week and found that at least in the FC6-current
kernels, the DUP packet problem still persists, but I haven't seen any recent
lockups (though I haven't used it as heavily as on the past).

How should I proceed?
Comment 24 Bernard Johnson 2007-02-17 14:30:02 EST
Since the DUP problem remains, I'm reopening for now.
Comment 25 John W. Linville 2007-03-21 14:46:35 EDT
Could you try the bcm43xx-mac80211 driver in the test kernels here?

   http://people.redhat.com/linville/kernels/fc6/

Do they show the DUP behaviour?
Comment 26 Bernard Johnson 2007-03-21 16:15:44 EDT
I know it seems that I'm being somewhat unhelpful in gathering information, but
let me explain a bit.

I do not see any bad behavior on my work network.  Everything works smoothly. 
(Although originally, I did have the DUP problem on my network).

Now, I usually see the behavior when I travel, on foreign networks.  I know of
three networks that cause this problem now.  Unfortunately it's impossible to
test them on a regular basis because they are > 1000 miles from where I live.

Also, I updated my laptop to rawhide a few days ago, although I suspect I'll see
it there as well.

I'll leave this NEEDINFO for now to remind me to keep an eye out for the DUP
packets as I roam on foreign networks.
Comment 27 John W. Linville 2007-04-10 16:49:25 EDT
Setting state back to NEEDINFO pending availability of the information from 
the previous comment...thanks!
Comment 28 John W. Linville 2007-08-07 14:44:10 EDT
Closed due to lack of response...please reopen if the problem persists on 
recent fedora kernels.

Note You need to log in before you can comment on or make changes to this bug.