Description of problem: Even though the broadcom 43xx driver always finds wireless networks for me, associating with them is another story. Booting into another operating system, the wlan works every time. I find that association only works with linux occasionally. Once it associates, the wireless networking seems stable, but getting it to associate takes a heap of tries before it works. My notebook has a broadcom 4306 wlan, and this problem report is based on the understanding that the notebook is located about 2 metres from the wireless AP with good signal strength and a connection is being established to a WEP-protected wireless network with credentials already having been provided to linux. Here's the device detail: b43-phy0: Broadcom 4306 WLAN found b43-phy0 debug: Found PHY: Analog 2, Type 2, Revision 2 b43-phy0 debug: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2 b43-phy0 debug: Adding Interface type 2 b43-phy0 debug: Loading firmware version 351.126 (2006-07-29 05:54:02) b43-phy0 debug: Chip initialized b43-phy0 debug: 30-bit DMA initialized b43-phy0 debug: Wireless interface started I've cut the firmware from http://downloads.openwrt.org/sources/broadcom-wl-4.80.53.0.tar.bz2. Here's what dmesg usually shows after an attempt to associate: b43-phy0 debug: Using hardware based encryption for keyidx: 0, mac: ff:ff:ff:ff:ff:ff wlan0: Initial auth_alg=1 wlan0: authenticate with AP 00:11:09:bf:b8:45 wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=2 status=0) wlan0: replying to auth challenge wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=4 status=0) wlan0: authenticated wlan0: associate with AP 00:11:09:bf:b8:45 wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2) wlan0: failed to parse AssocResp wlan0: associate with AP 00:11:09:bf:b8:45 wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2) wlan0: failed to parse AssocResp wlan0: associate with AP 00:11:09:bf:b8:45 wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2) wlan0: failed to parse AssocResp wlan0: association with AP 00:11:09:bf:b8:45 timed out b43-phy0 debug: Disabling hardware based encryption for keyidx: 0, mac: ff:ff:ff:ff:ff:ff When it actually does connect and the wireless network comes up, it obviously gets some errors in the process and then succeeds: b43-phy0 debug: Using hardware based encryption for keyidx: 0, mac: ff:ff:ff:ff:ff:ff wlan0: Initial auth_alg=1 wlan0: authenticate with AP 00:11:09:bf:b8:45 wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=2 status=0) wlan0: replying to auth challenge wlan0: RX authentication from 00:11:09:bf:b8:45 (alg=1 transaction=4 status=0) wlan0: authenticated wlan0: associate with AP 00:11:09:bf:b8:45 wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2) wlan0: failed to parse AssocResp wlan0: associate with AP 00:11:09:bf:b8:45 wlan0: RX AssocResp from 00:11:09:bf:b8:45 (capab=0x431 status=0 aid=2) wlan0: associated wlan0: CTS protection enabled (BSSID=00:11:09:bf:b8:45) wlan0: switched to short barker preamble (BSSID=00:11:09:bf:b8:45) ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Version-Release number of selected component (if applicable): 2.6.22.9-91.fc7 How reproducible: Almost always Expected results: It should work all the time.
Are you using Networkmanager? Does it behave differently if you stop NetworkManager and use the ifconfig/iwconfig commands by hand? service NetworkManager stop ifconfig wlan0 up iwconfig wlan0 key <wep key> iwconfig wlan0 essid <essid for wlan> dhclient wlan0
Yes I am using NetworkManager, but it does not make any difference if I use the above, I get all the same errors, notably the "failed to parse AssocResp" message. If you provide me with a precompiled driver (2.6.22.9-91.fc7) or something I can use to compile up the driver with changes that might indicate the issue, I'm happy to install and test and report dmesg content.
Hmmm, looks like the "failed to parse" error comes from /lib/modules/2.6.22.9-91.fc7/kernel/net/mac80211/mac80211.ko in function ieee802_11_parse_elems where it seems to be caused by an unknown element received. Code off the net I found seems to have used to logged a warning when an unknown element was received but now returns complete failure. Downloading official source now.
I've got similar hardware (bcm4306 in a powerbook g4), but a bit different behavior, using the latest rawhide kernels. I had problems associating with F7 kernels up to 2.6.22.<something> and have since switched over to rawhide. No more association problems with the latest rawhide kernel, though I've had the connection go completely belly-up on me, killing off NetworkManager and my wireless base station (happened once, haven't yet tried to reproduce).
I've managed to compile up my own mac80211.ko with debug in the ieee802_11_parse_elems function of ieee80211_sta.c that shows the content of the association response. It turns out that my router is sending the required association response detail elements, but most of the time it is trailing them with invalid element ids, and linux is being strict about them, failing the association because of it, even though other O/S's don't fail because of it. However, the linux code already illustrates a certain fault tolerance for apple hardware, as indicated by this comment: /* Do not trigger error if left == 1 as Apple Airport base stations * send AssocResps that are one spurious byte too long. */ I have emailed the support contact of the router vendor about this, but a practical solution to this issue is to make linux tolerant of the router bug by modifying its handling of invalid element ids to instead consider the parsing of the association response to be finished when it hits an invalid id. This is easily accomplished in the code.
Well, I appreciate the "shoe leather"...but I think your analysis if flawed. All the places calling ieee802_11_parse_elems specifically check the return code for ParseFailed. If ieee802_11_parse_elems encounters unknown IE types, it will return ParseUnknown. So, this does not explain the error you are experiencing. That function only return ParseFailed if the next element runs off the end of the frame. So if you are getting that message, it would seem that your AP is generating bad association responses. Might your AP be generating fragmented association reponses? I don't know if we can handle that, or if it is actually compliant w/ the spec...
Sorry for the lack of clarity. ParseFailed is being returned, as the code checks the run off the end of the frame before it checks the id, but both the id and the elen are wrong as returned by my router after the other valid ids and elens - it seems there's rubbish at the end of the frame or that the frame length is wrong. parse_elems len=18 (total frame len) WLAN_EID_SUPP_RATES id=1 elen=8 left=16) -> OK WLAN_EID_EXT_SUPP_RATES id=50 elen=4 left=6) -> OK IEEE 802.11 element parse failed (id=67 elen=207 left=0) -> BAD So it is hitting an id in this instance of 67 and elen of 207, both bad. Judging by the valid frames, the frame length should really be 2+8 + 2+4 which is 16, and the router is returning 18. The Apple airport bug that the driver already works around returns 17 instead of 16.
Created attachment 223561 [details] proposed patch to ieee80211_sta.c I propose a patch, attached...basically it moves the frame overflow detection in the ieee802_11_parse_elems function into the switch, but in the default case of bad id, permits the return of ParseUnknown in the case of a frame overflow. All the other cases still return ParseFailed.
Regarding patches, plesae use "diff -u" to generate them and please follow kernel style guidelines (esp. using tabs for indentation) if at all possible...thanks! If I understand your patch, you are trying to simply ignore any unknown elements at the end of a frame, even if they run off the end. This presumes they are junk data. This gave me some initial heartburn and I'm still not entirely sure it is the right way to go. But, I'm prepared to consider the approach.
Created attachment 224811 [details] parse-elems-trunk-junk.patch I like it better coded this way.. :-)
Would you mind testing the above patch in your environment? Thanks! BTW, what is the make/model of your access point?
Yes, your patch works fine for me. Thanks! The wireless-router-voip-printserver is a http://www.draytek.com.au/products/Vigor2900.php which has one of the richest feature sets around, supports lpr printing and hence linux, and has been good to me until now.
A somewhat different patch was accepted upstream. The 2.6.23.1-5.fc7 kernels have it: http://koji.fedoraproject.org/koji/buildinfo?buildID=21517