Bug 989269
Summary: | Connecting to WLAN causes kernel panic | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris <piecuch.krzysztof> |
Component: | kernel | Assignee: | fedora-kernel-wireless-brcm80211 |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 19 | CC: | ali, arend, charles, chref, daniel, dean, gansalmon, ignatenko, itamar, jonathan, jpesco, kcleveng, kernel-maint, kvolny, lepennec, madhu.chinakonda, mail, marcosmds, mechonbarsa, piecuch.krzysztof, robin, root, sanjay.ankur, scottt.tw, stas.ashirov |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | Flags: | charles:
needinfo+
|
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.9-200.fc19 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-23 00:31:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Chris
2013-07-28 23:33:25 UTC
Looking at the screenshots this could be a duplicate to bug 962211. It look like a divide-by-zero exception in brcms_c_calc_frame_time(). Created attachment 779891 [details]
avoid divide-by-zero error
Please apply the patch and provide log. The brcmsmac must be loaded with module parameter 'debug' set to 1.
Adding this code solved the problem. Thanks for help. (In reply to Chris from comment #3) > Adding this code solved the problem. Thanks for help. can you still provide me the log as indicated. I want more info of how we got in this condition. I tried loading the brcmsmac with debug=1, but here are the results: Modinfo gives no important data when asked for brcmsmac parameters [root@localhost krzysztof]# modinfo --author --description --parameters brcmsmac [root@localhost /]# modinfo -Fp brcmsmac [root@localhost /]# modinfo -p brcmsmac [root@localhost /]# Loading the module: [root@localhost /]# modprobe brcmsmac debug=1 modprobe: ERROR: could not insert 'brcmsmac': Unknown symbol in module, or unknown parameter (see dmesg) Asking dmesg: [root@localhost /]# dmesg | grep brcmsmac [ 3228.799050] brcmsmac: Unknown parameter `debug' By the way, if the logs are not in dmesg, please give me description, where should I search for the data you need. When I load brcmsmac without any parameters, connect to WLAN and run [root@localhost krzysztof]# dmesg | grep brcm it gives me http://pastie.org/8188061 I guess you do not have CONFIG_BRCMDBG set. Can you change the brcms_dbg_info() statement in the patch to brcms_err() instead and try without the debug parameter. Created attachment 780791 [details]
dmesg | grep brcms when connecting to WLAN after patch
About attachment 780791 [details]:
I changed line 620
brcms_dbg_info(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
into
brcms_err(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n",
and compiled the kernel. Hope, I did it correct.
The internet connection works, but is unstable: disconnects, sometimes gets really slow, sometimes I receive duplicate messages from friends, or just says that it is connected and refuses to do anything saying "Loading...". Generally, if you have bad luck you have about 20% of usability time.
(In reply to Chris from comment #8) > About attachment 780791 [details]: > I changed line 620 > brcms_dbg_info(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n", > into > brcms_err(wlc->hw->d11core, "wl%d: ratespec %d preamb %d\n", > and compiled the kernel. Hope, I did it correct. That was indeed the change I meant. You did well. > The internet connection works, but is unstable: disconnects, sometimes gets > really slow, sometimes I receive duplicate messages from friends, or just > says that it is connected and refuses to do anything saying "Loading...". > Generally, if you have bad luck you have about 20% of usability time. The patch was not intended as a fix, but to keep the driver and system alive so we can get more debugging info. Will have a look at the new attachment. That did not work as intended. The log entry in line 620 is printed twice per packet, which makes the driver so slow it does not even connect. Created attachment 780809 [details]
only print rate info for error case
Please provide log with this patch applied.
Created attachment 780839 [details]
dmesg | grep brcms when connecting to WLAN after patch 2
During gathering this data I connected to the internet, was sitting for a while and then walked through a corridor in my university, so that the computer was connecting to different routers. Sat down there for significantly longer time. At the end I reconnected and disconnected.
It seems to work stable, without any problems, but I haven't tried to use the connection for something heavier.
(In reply to Chris from comment #12) > Created attachment 780839 [details] > dmesg | grep brcms when connecting to WLAN after patch 2 > > During gathering this data I connected to the internet, was sitting for a > while and then walked through a corridor in my university, so that the > computer was connecting to different routers. Sat down there for > significantly longer time. At the end I reconnected and disconnected. > It seems to work stable, without any problems, but I haven't tried to use > the connection for something heavier. Thanks for the data. I observed two values that are invalid. ratespec value 0 is invalid and the driver selects 1Mbps rate to do the calculation. The other value 134217838 is what triggers the divide-by-zero. The ratespec value is: ratespec: 0x800006E RATE 110 (rate value [unit: 500Kbps or MCS index]) MIMORATE 1 (RATE field represents MIMO MCS index) This does not make sense, because MCS index can only go up to 32. I suspect this should not be a mimo rate, but 54Mbps. Looking further how we end up in this situation. *** Bug 962211 has been marked as a duplicate of this bug. *** Created attachment 781037 [details]
revert commit to obtain more info
Could you apply the following patch as well and provide log output.
Created attachment 781295 [details]
dmesg | grep brcms when connecting to WLAN after patch 3
This time I turned on the computer, Wi-Fi was on already, so I suspended it, went to the router area and connected (were problems with password, but connected to other network), later got disconnected, connected again, at the end I turned off Wi-Fi.
When you give me next patch, please provide me with the information, what to do to gather sufficient data, like yesterday I was sitting long to make sure everything is OK.
I have applied patch 3 without undoing patch 2.
Created attachment 781525 [details]
get more information on rate info conversion
here the instructions: - apply patch attachment 780809 [details] - apply patch attachment 781525 [details] - insert brcmsmac module - assure log has entry saying "wl0: invalid mcs mapping" - provide full dmesg, ie. not doing a grep over it Created attachment 781773 [details] dmesg after comment #18 instructions I have compiled the module, loaded it and had some problems with point 4, but after some tries I managed to do this. 1. Rebooted. 2. Unloaded a module. 3. Loaded a module. 4. Connected to the internet. 5. Got disconnected, tried to connect to some network (without listing any). 6. Asked for password, could not accept it, as the button was blocked, so I clicked "cancel". 7. Some weird stuff similar to 5 and 6. 8. Found a network, tried connecting. 9. Network list went clear, still connecting. 10. Turned off Wi-Fi (not sure, I can't see it in dmesg) 11. dmesg > dmesg Created attachment 783783 [details]
dmesg after next system reinstallation
Now I have reinstalled the system once again from the LiveUSB .iso image that was on the site on the 4th of July. It seems to work fine, i.e. I am working without any kernel panics without consciously applying the patch, just yum update.
However, my dmesg looks like in the attachement.
Running Arch Linux here, but suffering the same issue. I have compiled the module with patches above, will generate the log as soon as I come home. What I have noticed: Linux 3.10.3 (I think, possibly it was 3.10.4) crashed, 3.10.6 and 3.10.7 crash as well. Looks like 3.10.5 is stable so far, but I do not have an explanation for that. Created attachment 787360 [details]
dmesg
And here it is... Hope this helps to fix the problem. Let me know if there is anything else I can test/provide. Thanks a lot!
Created attachment 787446 [details]
Stack trace
I'm having this problema too on Arch Linux, kernel 3.10.6-2-ARCH. Planing to buy a Intel card :)
I've filed a possible duplicate: https://bugzilla.redhat.com/show_bug.cgi?id=989269 Please do take a look and close the bug if need be. Ankur (In reply to Ankur Sinha (FranciscoD) from comment #26) > I've filed a possible duplicate: > https://bugzilla.redhat.com/show_bug.cgi?id=989269 > > Please do take a look and close the bug if need be. The possible duplicate and this one are the same. Better not close it. Not sure if anybody reported this before... Looks like this commit (re-)introduced the problem for 3.10.6: mac80211/minstrel_ht: fix cck rate sampling https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=ef47a5e4f1aaf1d0e2e6875e34b2c9595897bef6 (upstream commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5) In 3.10 cck rate support was added to minstrel_ht. I suspect there is an integration issue between minstrel_ht and brcmsmac causing this panic and possibly phy tx errors that people complained about in other bug reports. Still investigating this. But 998080 is a duplicate of this one. *** Bug 998080 has been marked as a duplicate of this bug. *** I've started a scratch build with a patch from Felix Fietkau that only enables cck to certain drivers. This was tested upstream by one person and successfully solved the crash and Arend has suggested we pick it up temporarily until brcmsmac is fixed (and/or Felix's patch is upstreamed). Please test this scratch build when it completes and let us know if it resolves the issues: http://koji.fedoraproject.org/koji/taskinfo?taskID=5836737 If anybody is interested in an Arch kernel to test Felix' patch: http://dl.mylinuxtime.de/arch/eworm/x86_64/linux-3.10.9-1-x86_64.pkg.tar.xz Works for me. I am having the laptop repaired at themoment so I can not check your patch. PS. They do not touch my Wi-Fi. There are multiple people impacted by this bug and on CC. If you cannot test or haven't tested yet, please don't clear the needinfo flag. I am sorry for it, it was not done on purpose. It is an issue with the bugzilla. When you cc yourself, it seems to clear needinfo. Works for me. The new scratch build seems to solve the issue for me. No kernel panic after 5 minutes of use... OK, thanks. I'll get the patch into the next official build. (In reply to Josh Boyer from comment #33) > I've started a scratch build with a patch from Felix Fietkau that only > enables cck to certain drivers. This was tested upstream by one person and > successfully solved the crash and Arend has suggested we pick it up > temporarily until brcmsmac is fixed (and/or Felix's patch is upstreamed). > Please test this scratch build when it completes and let us know if it > resolves the issues: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=5836737 Confirming that this build doesn't cause the kernel to crash any more. [asinha@ankur ~]$ uname -a Linux ankur.pc 3.10.9-200.fc19.x86_64 #1 SMP Wed Aug 21 19:27:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Thanks for the fix, folks. Warm regards, Ankur kernel-3.10.9-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.10.9-200.fc19 kernel-3.10.9-100.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.10.9-100.fc18 *** Bug 982264 has been marked as a duplicate of this bug. *** kernel-3.10.9-100.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. kernel-3.10.9-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. New update to kernel 3.10.9-200.fc19.x86_64 restored sanity to my system. This version is working good for me. Thank you! |