Bug 989269

Summary: Connecting to WLAN causes kernel panic
Product: [Fedora] Fedora Reporter: Chris <piecuch.krzysztof>
Component: kernelAssignee: fedora-kernel-wireless-brcm80211
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: ali, arend, charles, chref, daniel, dean, gansalmon, ignatenko, itamar, jonathan, jpesco, kcleveng, kernel-maint, kvolny, lepennec, madhu.chinakonda, mail, marcosmds, mechonbarsa, piecuch.krzysztof, robin, root, sanjay.ankur, scottt.tw, stas.ashirov
Target Milestone: ---Keywords: Reopened
Target Release: ---Flags: charles: needinfo+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.9-200.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-23 00:31:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
avoid divide-by-zero error
none
dmesg | grep brcms when connecting to WLAN after patch
none
only print rate info for error case
none
dmesg | grep brcms when connecting to WLAN after patch 2
none
revert commit to obtain more info
none
dmesg | grep brcms when connecting to WLAN after patch 3
none
get more information on rate info conversion
none
dmesg after comment #18 instructions
none
dmesg after next system reinstallation
none
dmesg
none
Stack trace none

Description Chris 2013-07-28 23:33:25 UTC
Whenever I connect to WLAN my computer gets kernel panic. My network controller:
02:00.0 Network controller: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller (rev 01)

Here is the whole gallery of kernel panics that I gathered.

http://imgur.com/a/4j0mM

Once I make it to connect a network it gets kernel panic. When I reboot it and a network is in range it panics before I see the login screen. In order to boot the computer I must go away from the Wi-Fi range and turn on the computer, turn off wireless, or just run rescue version without Wi-Fi.

It behaves like that on all kernels in Fedora 19.

When I reinstall the system it gives me a few minutes (or seconds) of Wi-Fi, just enough to write "it's working" and then panics. After some time it panics immediately after connecting to WLAN.

Comment 1 Arend van Spriel 2013-07-29 13:09:51 UTC
Looking at the screenshots this could be a duplicate to bug 962211. It look like a divide-by-zero exception in brcms_c_calc_frame_time().

Comment 2 Arend van Spriel 2013-07-29 13:29:45 UTC
Created attachment 779891 [details]
avoid divide-by-zero error

Please apply the patch and provide log. The brcmsmac must be loaded with module parameter 'debug' set to 1.

Comment 3 Chris 2013-07-29 20:21:13 UTC
Adding this code solved the problem. Thanks for help.

Comment 4 Arend van Spriel 2013-07-29 21:42:42 UTC
(In reply to Chris from comment #3)
> Adding this code solved the problem. Thanks for help.

can you still provide me the log as indicated. I want more info of how we got in this condition.

Comment 5 Chris 2013-07-29 22:23:12 UTC
I tried loading the brcmsmac with debug=1, but here are the results:
Modinfo gives no important data when asked for brcmsmac parameters
[root@localhost krzysztof]# modinfo --author --description --parameters brcmsmac
[root@localhost /]# modinfo -Fp brcmsmac
[root@localhost /]# modinfo -p brcmsmac
[root@localhost /]# 

Loading the module:
[root@localhost /]# modprobe brcmsmac debug=1
modprobe: ERROR: could not insert 'brcmsmac': Unknown symbol in module, or unknown parameter (see dmesg)

Asking dmesg:
[root@localhost /]# dmesg | grep brcmsmac
[ 3228.799050] brcmsmac: Unknown parameter `debug'

By the way, if the logs are not in dmesg, please give me description, where should I search for the data you need.

When I load brcmsmac without any parameters, connect to WLAN and run
[root@localhost krzysztof]# dmesg | grep brcm
it gives me
http://pastie.org/8188061

Comment 6 Arend van Spriel 2013-07-30 08:38:15 UTC
I guess you do not have CONFIG_BRCMDBG set. Can you change the brcms_dbg_info() statement in the patch to brcms_err() instead and try without the debug parameter.

Comment 7 Chris 2013-07-30 18:20:20 UTC
Created attachment 780791 [details]
dmesg | grep brcms when connecting to WLAN after patch

Comment 8 Chris 2013-07-30 18:49:30 UTC
About attachment 780791 [details]:
I changed line 620
	brcms_dbg_info(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
into
	brcms_err(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
and compiled the kernel. Hope, I did it correct.

The internet connection works, but is unstable: disconnects, sometimes gets really slow, sometimes I receive duplicate messages from friends, or just says that it is connected and refuses to do anything saying "Loading...". Generally, if you have bad luck you have about 20% of usability time.

Comment 9 Arend van Spriel 2013-07-30 19:04:28 UTC
(In reply to Chris from comment #8)
> About attachment 780791 [details]:
> I changed line 620
> 	brcms_dbg_info(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
> into
> 	brcms_err(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
> and compiled the kernel. Hope, I did it correct.

That was indeed the change I meant. You did well.

> The internet connection works, but is unstable: disconnects, sometimes gets
> really slow, sometimes I receive duplicate messages from friends, or just
> says that it is connected and refuses to do anything saying "Loading...".
> Generally, if you have bad luck you have about 20% of usability time.

The patch was not intended as a fix, but to keep the driver and system alive so we can get more debugging info. Will have a look at the new attachment.

Comment 10 Arend van Spriel 2013-07-30 19:09:50 UTC
That did not work as intended. The log entry in line 620 is printed twice per packet, which makes the driver so slow it does not even connect.

Comment 11 Arend van Spriel 2013-07-30 19:19:06 UTC
Created attachment 780809 [details]
only print rate info for error case

Please provide log with this patch applied.

Comment 12 Chris 2013-07-30 20:59:26 UTC
Created attachment 780839 [details]
dmesg | grep brcms when connecting to WLAN after patch 2

During gathering this data I connected to the internet, was sitting for a while and then walked through a corridor in my university, so that the computer was connecting to different routers. Sat down there for significantly longer time. At the end I reconnected and disconnected.
It seems to work stable, without any problems, but I haven't tried to use the connection for something heavier.

Comment 13 Arend van Spriel 2013-07-31 08:11:41 UTC
(In reply to Chris from comment #12)
> Created attachment 780839 [details]
> dmesg | grep brcms when connecting to WLAN after patch 2
> 
> During gathering this data I connected to the internet, was sitting for a
> while and then walked through a corridor in my university, so that the
> computer was connecting to different routers. Sat down there for
> significantly longer time. At the end I reconnected and disconnected.
> It seems to work stable, without any problems, but I haven't tried to use
> the connection for something heavier.

Thanks for the data. I observed two values that are invalid. ratespec value 0 is invalid and the driver selects 1Mbps rate to do the calculation. The other value 134217838 is what triggers the divide-by-zero. The ratespec value is:
ratespec: 0x800006E
  RATE           110      (rate value [unit: 500Kbps or MCS index])
  MIMORATE       1        (RATE field represents MIMO MCS index)

This does not make sense, because MCS index can only go up to 32. I suspect this should not be a mimo rate, but 54Mbps. Looking further how we end up in this situation.

Comment 14 Piotr Haber 2013-07-31 08:22:39 UTC
*** Bug 962211 has been marked as a duplicate of this bug. ***

Comment 15 Arend van Spriel 2013-07-31 10:27:42 UTC
Created attachment 781037 [details]
revert commit to obtain more info

Could you apply the following patch as well and provide log output.

Comment 16 Chris 2013-07-31 18:23:18 UTC
Created attachment 781295 [details]
dmesg | grep brcms when connecting to WLAN after patch 3

This time I turned on the computer, Wi-Fi was on already, so I suspended it, went to the router area and connected (were problems with password, but connected to other network), later got disconnected, connected again, at the end I turned off Wi-Fi.

When you give me next patch, please provide me with the information, what to do to gather sufficient data, like yesterday I was sitting long to make sure everything is OK.

I have applied patch 3 without undoing patch 2.

Comment 17 Arend van Spriel 2013-08-01 11:01:54 UTC
Created attachment 781525 [details]
get more information on rate info conversion

Comment 18 Arend van Spriel 2013-08-01 11:06:50 UTC
here the instructions:

- apply patch attachment 780809 [details]
- apply patch attachment 781525 [details]
- insert brcmsmac module
- assure log has entry saying "wl0: invalid mcs mapping"
- provide full dmesg, ie. not doing a grep over it

Comment 19 Chris 2013-08-01 20:59:03 UTC
Created attachment 781773 [details]
dmesg after comment #18 instructions

I have compiled the module, loaded it and had some problems with point 4, but after some tries I managed to do this.
1. Rebooted.
2. Unloaded a module.
3. Loaded a module.
4. Connected to the internet.
5. Got disconnected, tried to connect to some network (without listing any).
6. Asked for password, could not accept it, as the button was blocked, so I clicked "cancel".
7. Some weird stuff similar to 5 and 6.
8. Found a network, tried connecting.
9. Network list went clear, still connecting.
10. Turned off Wi-Fi (not sure, I can't see it in dmesg)
11. dmesg > dmesg

Comment 22 Chris 2013-08-07 10:14:19 UTC
Created attachment 783783 [details]
dmesg after next system reinstallation

Now I have reinstalled the system once again from the LiveUSB .iso image that was on the site on the 4th of July. It seems to work fine, i.e. I am working without any kernel panics without consciously applying the patch, just yum update.
However, my dmesg looks like in the attachement.

Comment 23 Christian Hesse 2013-08-16 12:25:26 UTC
Running Arch Linux here, but suffering the same issue. I have compiled the module with patches above, will generate the log as soon as I come home.

What I have noticed: Linux 3.10.3 (I think, possibly it was 3.10.4) crashed, 3.10.6 and 3.10.7 crash as well. Looks like 3.10.5 is stable so far, but I do not have an explanation for that.

Comment 24 Christian Hesse 2013-08-16 16:33:29 UTC
Created attachment 787360 [details]
dmesg

And here it is... Hope this helps to fix the problem. Let me know if there is anything else I can test/provide. Thanks a lot!

Comment 25 Thiago Coutinho 2013-08-16 19:20:42 UTC
Created attachment 787446 [details]
Stack trace

I'm having this problema too on Arch Linux, kernel 3.10.6-2-ARCH. Planing to buy a Intel card :)

Comment 26 Ankur Sinha (FranciscoD) 2013-08-18 02:09:09 UTC
I've filed a possible duplicate: https://bugzilla.redhat.com/show_bug.cgi?id=989269

Please do take a look and close the bug if need be. 

Ankur

Comment 27 Arend van Spriel 2013-08-19 08:49:10 UTC
(In reply to Ankur Sinha (FranciscoD) from comment #26)
> I've filed a possible duplicate:
> https://bugzilla.redhat.com/show_bug.cgi?id=989269
> 
> Please do take a look and close the bug if need be. 

The possible duplicate and this one are the same. Better not close it.

Comment 29 Christian Hesse 2013-08-19 19:34:51 UTC
Not sure if anybody reported this before... Looks like this commit (re-)introduced the problem for 3.10.6:

mac80211/minstrel_ht: fix cck rate sampling
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=ef47a5e4f1aaf1d0e2e6875e34b2c9595897bef6
(upstream commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5)

Comment 30 Arend van Spriel 2013-08-19 21:29:34 UTC
In 3.10 cck rate support was added to minstrel_ht. I suspect there is an integration issue between minstrel_ht and brcmsmac causing this panic and possibly phy tx errors that people complained about in other bug reports. Still investigating this.

Comment 31 Christian Hesse 2013-08-21 07:42:49 UTC
But 998080 is a duplicate of this one.

Comment 32 Josh Boyer 2013-08-21 12:40:37 UTC
*** Bug 998080 has been marked as a duplicate of this bug. ***

Comment 33 Josh Boyer 2013-08-21 12:54:26 UTC
I've started a scratch build with a patch from Felix Fietkau that only enables cck to certain drivers.  This was tested upstream by one person and successfully solved the crash and Arend has suggested we pick it up temporarily until brcmsmac is fixed (and/or Felix's patch is upstreamed).  Please test this scratch build when it completes and let us know if it resolves the issues:

http://koji.fedoraproject.org/koji/taskinfo?taskID=5836737

Comment 34 Christian Hesse 2013-08-21 13:00:46 UTC
If anybody is interested in an Arch kernel to test Felix' patch:
http://dl.mylinuxtime.de/arch/eworm/x86_64/linux-3.10.9-1-x86_64.pkg.tar.xz

Works for me.

Comment 35 Chris 2013-08-21 13:09:13 UTC
I am having the laptop repaired at themoment so I can not check your patch.

PS. They do not touch my Wi-Fi.

Comment 36 Josh Boyer 2013-08-21 13:22:32 UTC
There are multiple people impacted by this bug and on CC.  If you cannot test or haven't tested yet, please don't clear the needinfo flag.

Comment 37 Chris 2013-08-21 14:12:34 UTC
I am sorry for it, it was not done on purpose.

Comment 38 Charles 2013-08-21 15:34:58 UTC
It is an issue with the bugzilla. When you cc yourself, it seems to clear needinfo.

Comment 39 Dean Brettle 2013-08-21 17:16:26 UTC
Works for me.

Comment 40 Erwan LE PENNEC 2013-08-21 17:19:23 UTC
The new scratch build seems to solve the issue for me. No kernel panic after 5 minutes of use...

Comment 41 Josh Boyer 2013-08-21 17:38:28 UTC
OK, thanks.  I'll get the patch into the next official build.

Comment 42 Ankur Sinha (FranciscoD) 2013-08-22 01:29:35 UTC
(In reply to Josh Boyer from comment #33)
> I've started a scratch build with a patch from Felix Fietkau that only
> enables cck to certain drivers.  This was tested upstream by one person and
> successfully solved the crash and Arend has suggested we pick it up
> temporarily until brcmsmac is fixed (and/or Felix's patch is upstreamed). 
> Please test this scratch build when it completes and let us know if it
> resolves the issues:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=5836737

Confirming that this build doesn't cause the kernel to crash any more. 

[asinha@ankur  ~]$ uname -a
Linux ankur.pc 3.10.9-200.fc19.x86_64 #1 SMP Wed Aug 21 19:27:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks for the fix, folks. 

Warm regards,
Ankur

Comment 43 Fedora Update System 2013-08-22 05:06:04 UTC
kernel-3.10.9-200.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.10.9-200.fc19

Comment 44 Fedora Update System 2013-08-22 05:06:39 UTC
kernel-3.10.9-100.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.10.9-100.fc18

Comment 45 Josh Boyer 2013-08-22 16:48:14 UTC
*** Bug 982264 has been marked as a duplicate of this bug. ***

Comment 47 Fedora Update System 2013-08-23 00:31:13 UTC
kernel-3.10.9-100.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 48 Fedora Update System 2013-08-23 00:43:50 UTC
kernel-3.10.9-200.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 49 Marcos Martins da Silva 2013-08-23 23:10:27 UTC
New update to kernel 3.10.9-200.fc19.x86_64 restored sanity to my system. This version is working good for me. Thank you!