Bug 913341 - Kernel 3.8 breaks b43 wireless
Summary: Kernel 3.8 breaks b43 wireless
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-21 02:26 UTC by William Brown
Modified: 2013-05-31 05:16 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-05-31 05:16:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Not working scan from 3.8.0 (6.50 KB, application/octet-stream)
2013-02-21 02:26 UTC, William Brown
no flags Details
b43 scan on 3.7.4 working correctly (32.65 KB, text/plain)
2013-02-21 02:27 UTC, William Brown
no flags Details

Description William Brown 2013-02-21 02:26:49 UTC
Created attachment 700309 [details]
Not working scan from 3.8.0

Description of problem:
03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02)
	Subsystem: Apple Inc. AirPort Extreme
	Flags: bus master, fast devsel, latency 0, IRQ 17
	Memory at b0600000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [58] Vendor Specific Information: Len=78 <?>
	Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [d0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [13c] Virtual Channel
	Capabilities: [160] Device Serial Number 33-88-6d-ff-ff-4f-68-a8
	Capabilities: [16c] Power Budgeting <?>
	Kernel driver in use: bcma-pci-bridge

On 3.7.4 this card works, and detects the 21 ap's in my near by vicitiny.

However, on 3.8.0, the cisco APs, on various channels are unable to be detected on a scan. Additionally, most times running iwlist scanning states "no ssids found". Maybe 1 in 10 times you get output. 


Version-Release number of selected component (if applicable):
Kernel 3.7.4 compared to 3.8.0

How reproducible:
Always

Additional info:

Comment 1 William Brown 2013-02-21 02:27:42 UTC
Created attachment 700310 [details]
b43 scan on 3.7.4 working correctly

Comment 2 Larry Finger 2013-02-21 16:55:34 UTC
What device do you have as indicated by 'lspci -nn'?

Comment 3 Larry Finger 2013-02-21 17:49:21 UTC
I do not run Fedora, but I downloaded kernel-3.8.0-2.fc19.x86_64.rpm and forced it to install on my openSUSE 12.3-RC1 installation.

After rebooting into that kernel, my BCM4312 (14e4:4315) scanned the network correctly every time. I did about 40 tries.

Because of incompatibilities between the F19 kernel and my user space, NetworkManager did not work correctly. As I did not want to disturb my networking setup, I was unable to test the device with an actual connection; however, everything seemed normal.

Comment 4 William Brown 2013-02-21 22:03:24 UTC
03:00.0 Network controller [0280]: Broadcom Corporation BCM4331 802.11a/b/g/n [14e4:4331] (rev 02)


At home it seems to be okay - Only at work with the "Enterprise" access point's from Cisco does it seem to be an issue. Either an issue in radius, or roaming magic?

Comment 5 Larry Finger 2013-02-21 22:57:21 UTC
Neither radius or roaming are handled by b43.

There should not be any difference between scanning on an AP with PERSONAL or ENTERPRISE authentication. If all you changed was the kernel, then the problem could be someplace in the mac80211 layer.

Have you checked the logs? Use 'dmesg' to see anything the driver will log, and in /var/log/NetworkManager if that is appropriate.

I can think of only two other ways to diagnose this problem. The first would be to bisect between 3.7 and 3.8 to identify which commit caused the problem. The other would be to use wireshark on a separate computer to capture the packets on the air.

Comment 6 Andreas John 2013-03-11 16:24:36 UTC
Hi,
same observation here. I run a Apple rMBP with a 4331 chipset and can't get a good connection at my customer's enterprise wifi infrastructure made with Cisco stuff. I have no poblems at home with a openwrt (Buffalo HW).

I tried to find a workaround by changing the b43 kernel options, but that doesnt change the behavior.

I takes some attempts to make an association all, if I am lucky I have in the best case about 4% paketloss.

I tried to set verbose=3 in b43, but that does not show much.

rgds,
j

Comment 7 Andreas John 2013-03-11 16:41:56 UTC
I might have to add: I use USB US54E 11G Dongle which works flawless at that site. I see the same problems with 3.7.10 and  3.8.2. The chipset is also a [14e4:4331].

Comment 8 Larry Finger 2013-03-11 17:55:58 UTC
Can either of you implement the mainline git repo and do a bisection between kernel 3.8 and 3.7? I do not have a 4331 device and cannot duplicate this problem with my BCM4312.

To clarify, the USB US54E 11G Dongle does not have the same problem as the BCM4331. Does it work in both places? What is its ID as shown by lsusb? From the name, I guess that it is 802.11g, whereas the 4331 is 802.11n. That might be important.

Comment 9 Andreas John 2013-03-11 18:29:59 UTC
Well,
I can test again tomorrow at customer's site. The problem is still there with 3.7.10. And BTW i can remember seeing traffic coming in in tcpdump on that interface (Multicast?). So, you crystall ball diagnosis about the MAC layer might be right ;)

Comment 10 Andreas John 2013-03-12 13:56:16 UTC
Hello,
I downgraded to kernel 3.7.4 and observerd same buggy behavior as with 3.8.2. So I cannot bisect anything.

Here is from dmesg what happens if I activate the "bad" NIC:

[11476.018358] wlan1: send auth to 00:3a:99:XX:XX:XX (try 1/3)
[11476.023871] wlan1: authenticated
[11476.024115] b43 bcma0:0 wlan1: disabling HT as WMM/QoS is not supported by the AP
[11476.024124] b43 bcma0:0 wlan1: disabling VHT as WMM/QoS is not supported by the AP
[11476.026236] wlan1: associate with 00:3a:99:XX:XX:XX (try 1/3)
[11476.032618] wlan1: RX AssocResp from 00:3a:99:XX:XX:XX (capab=0x431 status=0 aid=1)
[11476.038430] wlan1: associated
[11476.038595] cfg80211: Calling CRDA for country: DE
[11476.044158] cfg80211: Regulatory domain changed to country: DE
[11476.044187] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[11476.044191] cfg80211:   (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
[11476.044194] cfg80211:   (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[11476.044197] cfg80211:   (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[11476.044200] cfg80211:   (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)


Here is what I get in iwconfig:

root@beacon:~# iwconfig wlan0
wlan0     IEEE 802.11bg  ESSID:"xxxx"  
          Mode:Managed  Frequency:2.412 GHz  Access Point: 6C:9C:ED:XX:XX:XX   
          Bit Rate=24 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=36/100  Signal level=36/100  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:8  Invalid misc:16   Missed beacon:0

wlan1     IEEE 802.11bg  ESSID:"guestbykey"  
          Mode:Managed  Frequency:2.462 GHz  Access Point: 00:3A:99:XX:XX:XX   
          Bit Rate=18 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=70/70  Signal level=0 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:89  Invalid misc:49   Missed beacon:0

(Strange that the "bad" NIC reports link quality of 70/70 some times. (not always).

Any ideas?

rgds,
j

Comment 11 Larry Finger 2013-03-12 15:00:58 UTC
It seems that the two of you have different problems.

I pay very little attention to any link quality or signal strength values. The hardware does not report either in any reliable way.

Comment 12 Andreas John 2013-03-12 15:11:32 UTC
Hey,
should I open a second bug?

And: Isnt there any newer firmware around, maybe not from broadcom but within the macos drivers?

Comment 13 Larry Finger 2013-03-12 15:26:52 UTC
I do not see any mention of what firmware you have, but we have never extracted firmware from either OS X or Windows drivers as Broadcom hides the details that describe what firmware is in which part of those drivers. The ELF-based drivers are much friendlier.

Comment 14 Andreas John 2013-03-12 16:29:02 UTC
Oh,
sry, I thought everyone used that fw_cutter with file from your site....
 
Fact is the drivers loads:

[   15.127058] b43-phy0: Loading firmware version 666.2 (2011-02-23 01:15:07)


Is there any other way I could try?

Rgds,
j

Comment 15 Larry Finger 2013-03-12 16:42:59 UTC
Version 666.2 is the highest version number that fwcutter can extract. In any case, I doubt that the firmware is the problem.

Comment 16 Andreas John 2013-03-12 16:58:17 UTC
What did you mean with the ELF-based drivers? Are those available for my chipset?

Comment 17 Andreas John 2013-03-12 16:59:10 UTC
And: What kind of debbug can I enable to provide further information?

Comment 18 Larry Finger 2013-03-12 19:00:13 UTC
Those drivers are the ones used on routers, and similar SoC systems. The architecture is generally for MIPS or ARM host processors, which likely precludes you from using them directly.

If you want an alternative driver, use Broadcom's wl. Note that it will taint your kernel and stop any kernel developer from looking at your system.

If you have all the b43, mac80211, and cfg80211 debug options enabled, that is about the best we can do from your system. From the dmesg output you have provided, you have authenticated and associated, but no data is transmitted. Running a separate system and capturing the on-the-air data using wireshark or kismet would be very useful.

Comment 19 William Brown 2013-03-13 21:43:51 UTC
How do you enable the b432 mac80211 and cfg80211 debug options?

Additionally, I'm quite sure that in fedora the act of loading the b43 firmware actually taints the kernel anyway from reporting bugs. (B43 firmware is the only non free component I load on my system)

Comment 20 Larry Finger 2013-03-14 00:12:00 UTC
Those are kernel configuration variables. If Fedora does not enable them, then you need to generate your own kernel, or perhaps they provide a "debug" kernel.

As to the firmware tainting the kernel, my reply is "bull". The kernel is only tainted by loading *CODE* that does not have a GPL license. Firmware is *DATA* to the kernel, not a program.

Comment 21 William Brown 2013-03-14 00:17:18 UTC
They provide a debug kernel. I'll try that. Do I also need to add extra parameters? Will the debug output just land in dmesg?

Well, sadly, it may be "bull" but it's the only out of tree component loaded on my system (I only use in-kernel drivers), yet it claims taint. It's an issue I'll perhaps raise into a another bug then if it's not meant to be the case.

Comment 22 Larry Finger 2013-03-14 00:27:10 UTC
The debug output will be in the dmesg output, and some in /var/log/messages.

What are the "taint" flags? If a previous warning has been issued, one of the taint flags (I think "G") will be set.

Comment 23 Larry Finger 2013-03-14 00:40:45 UTC
The meaning of each symbol is as follows:

*  'P' - Proprietary module has been loaded.
*  'F' - Module has been forcibly loaded.
*  'S' - SMP with CPUs not designed for SMP.
*  'R' - User forced a module unload.
*  'M' - System experienced a machine check exception.
*  'B' - System has hit bad_page.
*  'U' - Userspace-defined naughtiness.
*  'D' - Kernel has oopsed before
*  'A' - ACPI table overridden.
*  'W' - Taint on warning.
*  'C' - modules from drivers/staging are loaded.
*  'I' - Working around severe firmware bug.
*  'O' - Out-of-tree module has been loaded.

Comment 24 William Brown 2013-03-14 00:49:55 UTC
GW - Which from those flags means I got a warning, and it set the taint-bit to true. This makes more sense now, thanks. I'll try and track this down independently.

Comment 25 William Brown 2013-03-26 02:27:31 UTC
Trying this on a Radius WPA1 version of the same network works on 3.8.* . The WPA2 version doesn't show up in iwlist scanning. It doesn't appear to affect WPA2 personal. Setting the module to have nohwcrypt=1 shows up "some" of the missing access points in the iwlist scan that were previously missing (But not all). Is this a pointer to what changed between 3.7 and 3.8 that would possibly have caused this?

Sadly, haven't had time to run a debug kernel.

Comment 26 Fedora End Of Life 2013-04-03 15:55:17 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 27 Peter H. Jones 2013-04-11 17:58:36 UTC
Is Bug 694177 a duplicate of this one?

Comment 28 William Brown 2013-05-31 05:16:37 UTC
I don't think so re Bug 694177

Running 3.9, This issue appears to have been resolved now.


Note You need to log in before you can comment on or make changes to this bug.