Bug 324191 - Broadcom 43xx driver very frequently fails to associate
Summary: Broadcom 43xx driver very frequently fails to associate
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-09 03:59 UTC by David Campbell
Modified: 2007-11-30 22:12 UTC (History)
3 users (show)

Fixed In Version: 2.6.23.1-5.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-18 17:51:51 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
proposed patch to ieee80211_sta.c (7.44 KB, text/plain)
2007-10-10 23:23 UTC, David Campbell
no flags Details
parse-elems-trunk-junk.patch (1.31 KB, patch)
2007-10-11 21:18 UTC, John W. Linville
no flags Details | Diff

Description David Campbell 2007-10-09 03:59:36 UTC
Description of problem:

Even though the broadcom 43xx driver always finds wireless networks for me,
associating with them is another story.  Booting into another operating system,
the wlan works every time.

I find that association only works with linux occasionally.  Once it associates,
the wireless networking seems stable, but getting it to associate takes a heap
of tries before it works.

My notebook has a broadcom 4306 wlan, and this problem report is based on the
understanding that the notebook is located about 2 metres from the wireless AP
with good signal strength and a connection is being established to a
WEP-protected wireless network with credentials already having been provided to
linux.

Here's the device detail:

b43-phy0: Broadcom 4306 WLAN found
b43-phy0 debug: Found PHY: Analog 2, Type 2, Revision 2
b43-phy0 debug: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2

b43-phy0 debug: Adding Interface type 2
b43-phy0 debug: Loading firmware version 351.126 (2006-07-29 05:54:02)
b43-phy0 debug: Chip initialized
b43-phy0 debug: 30-bit DMA initialized
b43-phy0 debug: Wireless interface started

I've cut the firmware from
http://downloads.openwrt.org/sources/broadcom-wl-4.80.53.0.tar.bz2.

Here's what dmesg usually shows after an attempt to associate:

b43-phy0 debug: Using hardware based encryption for keyidx: 0, mac:
ff:ff:ff:ff:ff:ff
wlan0: Initial auth_alg=1
wlan0: authenticate with AP 00:11:09:bf:b8:45
wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=2 status=0)
wlan0: replying to auth challenge
wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=4 status=0)
wlan0: authenticated
wlan0: associate with AP 00:11:09:bf:b8:45
wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2)
wlan0: failed to parse AssocResp
wlan0: associate with AP 00:11:09:bf:b8:45
wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2)
wlan0: failed to parse AssocResp
wlan0: associate with AP 00:11:09:bf:b8:45
wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2)
wlan0: failed to parse AssocResp
wlan0: association with AP 00:11:09:bf:b8:45 timed out
b43-phy0 debug: Disabling hardware based encryption for keyidx: 0, mac:
ff:ff:ff:ff:ff:ff

When it actually does connect and the wireless network comes up, it obviously
gets some errors in the process and then succeeds:

b43-phy0 debug: Using hardware based encryption for keyidx: 0, mac:
ff:ff:ff:ff:ff:ff
wlan0: Initial auth_alg=1
wlan0: authenticate with AP 00:11:09:bf:b8:45
wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=2 status=0)
wlan0: replying to auth challenge
wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=4 status=0)
wlan0: authenticated
wlan0: associate with AP 00:11:09:bf:b8:45
wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2)
wlan0: failed to parse AssocResp
wlan0: associate with AP 00:11:09:bf:b8:45
wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2)
wlan0: associated
wlan0: CTS protection enabled (BSSID=00:11:09:bf:b8:45)
wlan0: switched to short barker preamble (BSSID=00:11:09:bf:b8:45)
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Version-Release number of selected component (if applicable):

2.6.22.9-91.fc7

How reproducible:

Almost always

Expected results:

It should work all the time.

Comment 1 John W. Linville 2007-10-09 12:52:03 UTC
Are you using Networkmanager?  Does it behave differently if you stop 
NetworkManager and use the ifconfig/iwconfig commands by hand?

   service NetworkManager stop
   ifconfig wlan0 up
   iwconfig wlan0 key <wep key>
   iwconfig wlan0 essid <essid for wlan>
   dhclient wlan0

Comment 2 David Campbell 2007-10-09 13:59:46 UTC
Yes I am using NetworkManager, but it does not make any difference if I use the
above, I get all the same errors, notably the "failed to parse AssocResp" message.

If you provide me with a precompiled driver (2.6.22.9-91.fc7) or something I can
use to compile up the driver with changes that might indicate the issue, I'm
happy to install and test and report dmesg content.


Comment 3 David Campbell 2007-10-09 14:28:15 UTC
Hmmm, looks like the "failed to parse" error comes from
/lib/modules/2.6.22.9-91.fc7/kernel/net/mac80211/mac80211.ko in function
ieee802_11_parse_elems where it seems to be caused by an unknown element received.

Code off the net I found seems to have used to logged a warning when an unknown
element was received but now returns complete failure.  Downloading official
source now.


Comment 4 Jarod Wilson 2007-10-09 19:11:54 UTC
I've got similar hardware (bcm4306 in a powerbook g4), but a bit different
behavior, using the latest rawhide kernels. I had problems associating with F7
kernels up to 2.6.22.<something> and have since switched over to rawhide. No
more association problems with the latest rawhide kernel, though I've had the
connection go completely belly-up on me, killing off NetworkManager and my
wireless base station (happened once, haven't yet tried to reproduce).

Comment 5 David Campbell 2007-10-10 06:39:04 UTC
I've managed to compile up my own mac80211.ko with debug in the
ieee802_11_parse_elems function of ieee80211_sta.c that shows the content of the
association response.

It turns out that my router is sending the required association response detail
elements, but most of the time it is trailing them with invalid element ids, and
linux is being strict about them, failing the association because of it, even
though other O/S's don't fail because of it.

However, the linux code already illustrates a certain fault tolerance for apple
hardware, as indicated by this comment:
	/* Do not trigger error if left == 1 as Apple Airport base stations
	 * send AssocResps that are one spurious byte too long. */

I have emailed the support contact of the router vendor about this, but a
practical solution to this issue is to make linux tolerant of the router bug by
modifying its handling of invalid element ids to instead consider the parsing of
the association response to be finished when it hits an invalid id.  This is
easily accomplished in the code.


Comment 6 John W. Linville 2007-10-10 20:42:28 UTC
Well, I appreciate the "shoe leather"...but I think your analysis if flawed.

All the places calling ieee802_11_parse_elems specifically check the return 
code for ParseFailed.  If ieee802_11_parse_elems encounters unknown IE types, 
it will return ParseUnknown.  So, this does not explain the error you are 
experiencing.

That function only return ParseFailed if the next element runs off the end of 
the frame.  So if you are getting that message, it would seem that your AP is 
generating bad association responses.

Might your AP be generating fragmented association reponses?  I don't know if 
we can handle that, or if it is actually compliant w/ the spec...

Comment 7 David Campbell 2007-10-10 21:42:41 UTC
Sorry for the lack of clarity.  ParseFailed is being returned, as the code
checks the run off the end of the frame before it checks the id, but both the id
and the elen are wrong as returned by my router after the other valid ids and
elens - it seems there's rubbish at the end of the frame or that the frame
length is wrong.

parse_elems len=18 (total frame len)
WLAN_EID_SUPP_RATES id=1 elen=8 left=16)  -> OK
WLAN_EID_EXT_SUPP_RATES id=50 elen=4 left=6) -> OK
IEEE 802.11 element parse failed (id=67 elen=207 left=0) -> BAD

So it is hitting an id in this instance of 67 and elen of 207, both bad.

Judging by the valid frames, the frame length should really be 2+8 + 2+4 which
is 16, and the router is returning 18.  The Apple airport bug that the driver
already works around returns 17 instead of 16.


Comment 8 David Campbell 2007-10-10 23:23:00 UTC
Created attachment 223561 [details]
proposed patch to ieee80211_sta.c

I propose a patch, attached...basically it moves the frame overflow detection
in the ieee802_11_parse_elems function into the switch, but in the default case
of bad id, permits the return of ParseUnknown in the case of a frame overflow. 
All the other cases still return ParseFailed.

Comment 9 John W. Linville 2007-10-11 21:14:45 UTC
Regarding patches, plesae use "diff -u" to generate them and please follow 
kernel style guidelines (esp. using tabs for indentation) if at all 
possible...thanks!

If I understand your patch, you are trying to simply ignore any unknown 
elements at the end of a frame, even if they run off the end.  This presumes 
they are junk data.  This gave me some initial heartburn and I'm still not 
entirely sure it is the right way to go.  But, I'm prepared to consider the 
approach.

Comment 10 John W. Linville 2007-10-11 21:18:03 UTC
Created attachment 224811 [details]
parse-elems-trunk-junk.patch

I like it better coded this way.. :-)

Comment 11 John W. Linville 2007-10-11 21:34:58 UTC
Would you mind testing the above patch in your environment?  Thanks!

BTW, what is the make/model of your access point?

Comment 12 David Campbell 2007-10-11 23:52:48 UTC
Yes, your patch works fine for me.  Thanks!

The wireless-router-voip-printserver is a
http://www.draytek.com.au/products/Vigor2900.php which has one of the richest
feature sets around, supports lpr printing and hence linux, and has been good to
me until now.


Comment 13 John W. Linville 2007-10-18 17:51:51 UTC
A somewhat different patch was accepted upstream.  The 2.6.23.1-5.fc7 kernels 
have it:

http://koji.fedoraproject.org/koji/buildinfo?buildID=21517


Note You need to log in before you can comment on or make changes to this bug.