Created attachment 717851 [details] dmesg output I get a lot of [ 4810.333836] brcmsmac bcma0:0: phyerr 0x10, rate 0x37 [ 4810.333840] brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x10) in my dmesg log. Sometimes wifi stops working altogether until the module is unloaded and loaded again or the system is rebooted. kernel-3.9.0-0.rc4.git0.2.fc20.x86_64 from the rawhide-nodebug repository 03:00.0 Network controller: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller (rev 01) Subsystem: Dell Inspiron M5010 / XPS 8300 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at f0500000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: bcma-pci-bridge Suspicious lines just before wifi disconnected: [ 4060.896679] brcmsmac bcma0:0: wl0: PSM microcode watchdog fired at 3914 (seconds). Resetting. [ 4060.896772] brcms_c_dpc : PSM Watchdog, chipid 0x4313, chiprev 0x1 [ 4060.896813] brcmsmac bcma0:0: wl0: fatal error, reinitializing [ 4060.898821] ieee80211 phy0: Hardware restart was requested [ 4060.957003] brcmsmac bcma0:0: brcms_ops_config: change power-save mode: false (implement) [ 4060.957137] brcmsmac bcma0:0: brcmsmac: brcms_ops_bss_info_changed: associated [ 4060.957141] brcmsmac bcma0:0: changing basic rates failed: -22
Wow..a lot going on in this log. Where to begin.. It might be a number of things: firmware/driver/AP. Is this really repeatable or intermittent? Appears this may be a single AP: does it occur on all APs or just this location?
PSM ucode watchdog is never a good sign. It seems the hardware restart mechanism with mac80211 behaves different than expected. Need to look into mac80211 code or ask Johannes about this. Also the phy error messages are a concern and I am wondering whether these were introduced in 3.9. Could you try a 3.8 kernel and see how that behaves? Can you disable bluetooth (rfkill block bluetooth)?
1) repeatable 2) It occurs on more than one AP. 3) bluetooth is disabled in the BIOS 4) Since this bug was reported I've given the machine to my brother, so testing on 3.8 will have to wait at least few days Also, if it's relevant, I logged to the machine remotely now, and it seems that the dmesg log buffer is completely filled with "[20388.400584] brcmsmac bcma0:0: START: tid 2 is not agg'able" and nothing else.
Elad, A couple more bits might help. Can you post the output of lspci -nn -nn will give us the device and vendor ID of the chip that might help. Appears to be "PSM Watchdog, chipid 0x4313, chiprev 0x1" Can you say what signal levels are when the problems occur: seems from logs you describe they occur nearly all the time. Do you notice a low signal level in these locations?
Created attachment 759790 [details] lspci -nn output
Thanks Elad.. 03:00.0 Network controller [0280]: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller [14e4:4727] (rev 01)
Did you have the opportunity to test with a 3.8 kernel? Could you also provide the details found in /sys/kernel/debug/brcmsmac/bcma*/hardware?
board vendor: 1028 board type: 10 board revision: 2211 board flags: 8002a00 board flags2: 800 firmware revision: 262032c I still only have remote access to the machine, hopefully I'll be able to test 3.8 during the weekend. Note that recently when logging in to check the logs I noticed no "phyerr" messages, but they might be there and just hiding because "brcmsmac bcma0:0: START: tid 1 is not agg'able" is flooding the kernel log buffer so much that it's the only think I can see in dmesg. According to journactl -ba those messages are constantly logged every few seconds (it varies, but it seems that 3 messages per second is the average rate in which those are filling the logs. That is 3 messages per second since connecting to wifi, so it's pretty much constant) Also, scanning the dmesg log from today, apart from all those tid message I see few "brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: Pkt tx suppressed, illegal channel possibly 7" messages, when the channel number changes (noticed 7, 5, 2 and 11). I don't know if this is relevant tho.
Arend: Thanks for stepping in.. I have a question: this area has a number of messages in this driver (~4 places in this source file) that create significant log spam. While good to know about them they seem to need to be toned down. I've posted a patch for one to do that, is this the right tact from your point of view. Most are not really an issue with driver performance. Is just reducing the volume in line with upstream view? I'll do so and post upstream, just want to get your take first.
I'm running 3.9.6-301.fc19.x86_64 and the messages have grown to 16MB in 2 hours. Same chip - Broadcom Corporation BCM4313 ---------------------------------------------- [ 1535.308928] brcmsmac bcma0:0: phyerr 0x10, rate 0x14 [ 1535.308945] brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x10) [ 1535.309889] brcmsmac bcma0:0: phyerr 0x10, rate 0x14 [ 1535.309904] brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x10)
So, here's a roll up of messages of high log spammers I've seen here and other BZs: brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: Pkt tx suppressed, illegal channel possibly 7" (Patched: wireless-next 99e9494 brcmsmac: Reduce log spam in heavy tx, make err print in debug Other candidates for a same workaround patch are..wish I'd have got 'em all with a single patch.. brcmsmac bcma0:0: START: tid 2 is not agg'able" and nothing else. BZ 918396 has this on F19 brcmsmac bcma0:0: phyerr 0x10, rate 0x14 brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x10) I'll propose upstream to make this stuff only debug visible (a la 99e9494 mentioned above) before to eliminate huge log issue elaborated here..till a better cause of the issue appearing in 3.9..
Arend, Looking through the 3.8 to 3.9 update, this appears to be the only realistic thing that be causing all these PHY issues. Added at 3.8 was commit b6fc28a158076ca2764edc9a6d1e1402f56e1c0c brcmsmac: support 4313iPA Add support for 4313 iPA variant. It is a variant of already supported 4313 ePA and needs some PHY changes to work properly. Was reverted in 3.9 commit 54683441a92ebe20c5282465ea6f21e5e74d2974 Author: John W. Linville <linville> Date: Wed Mar 27 10:52:11 2013 -0400 Revert "brcmsmac: support 4313iPA" This reverts commit b6fc28a158076ca2764edc9a6d1e1402f56e1c0c. This commit is reported to cause a regression in the support for some revisions of 4313 ePA devices. http://marc.info/?l=linux-wireless&m=136360340200943&w=2 So, having looked though the entire list of changes: does this change for 4313iPA support look to be part of the problem, whether included or not? It would appear to be a problem for some if in, others if not. Or is there something else at work here? Working on the patch to mute the other offenders in Comment 11. Will be testing shortly.
Patch submitted upstream to move this message to debug. Doesn't fix the problem, but keeps non-debug logs cleaner.
Hello, I've got the exact same problem, with the differece that I use Arch Linux and an vanilla kernel, but this is the only bugreport I've found on this issue. Also my Notebook is not by Dell, I own an Lenovo E535. However, the used wi-fi controller is the same: 02:00.0 Network controller [0280]: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller [14e4:4727] (rev 01) To keep it short, I've found a workaround: My router uses mode 11bgn mixed, but if I switch to 11g only or 11bg mixed, the problem is no more appearing! As soon as "n" comes into play (tested 11n only too), the problem occur. Even the dmesg-spam disappears (which was only present on 3.9.x kernels, on 3.8.x there's the same problem, but nothing in dmesg). I hope this could be helpfull information. If you need more details, just ask.
(In reply to Matthias Tingelhoff from comment #14) > To keep it short, I've found a workaround: My router uses mode 11bgn mixed, > but if I switch to 11g only or 11bg mixed, the problem is no more appearing! > As soon as "n" comes into play (tested 11n only too), the problem occur. > Even the dmesg-spam disappears (which was only present on 3.9.x kernels, on Thanks for the info Matthias. Yes, this is true: the ampdu is a 802.11n enhancement and doing as you suggest will disable this code and fix the issue. Lower thruputs will result, which may be ok, but I suspect most want 802.11 for that reason. Will upload my upstream patch here, hardware vendor investigating these issues. Will fix the spam issue, but underlying problem still will be there.
Created attachment 778800 [details] WORKAROUND to reduce log spam for to errors Submitted upstream, not included there as yet.
In brcmsmac's main.c, here is another recent source of spam for me: if (rate == 0) { brcms_err(wlc->hw->d11core, "wl%d: WAR: using rate of 1 mbps\n", wlc->pub->unit); rate = BRCM_RATE_1M; } I guess this has been around for a while according to http://permalink.gmane.org/gmane.linux.kernel.wireless.general/98898 - but I never got spammed by this until upgrading to kernel 3.10.9 (or so). I am using the patch just above, but since this new message above is still quite verbose, I have commented out the brcms_err above too. However, my BCM43224 seems to be a lot more stable than what I've read from BCM4313 users, so suppressing it isn't as worrisome. Bandwidth intensive processes still give me tid %d is not agg'able errors. If somebody knows of a way to set the verbosity of these messages when invoking `dmesg` or can write a patch to do so, it would be greatly appreciated. I apologize if this was slightly off-topic.
(In reply to tod.jackson from comment #17) > In brcmsmac's main.c, here is another recent source of spam for me: > > if (rate == 0) { > brcms_err(wlc->hw->d11core, "wl%d: WAR: using rate of 1 > mbps\n", > wlc->pub->unit); > rate = BRCM_RATE_1M; > } > > I guess this has been around for a while according to > http://permalink.gmane.org/gmane.linux.kernel.wireless.general/98898 - but I > never got spammed by this until upgrading to kernel 3.10.9 (or so). > > I am using the patch just above, but since this new message above is still > quite verbose, I have commented out the brcms_err above too. However, my > BCM43224 seems to be a lot more stable than what I've read from BCM4313 > users, so suppressing it isn't as worrisome. Bandwidth intensive processes > still give me tid %d is not agg'able errors. Patches I've done simply move these from brcms_err to brcms_dbg. They will still be logged, but invisible to "normal" logs. > If somebody knows of a way to set the verbosity of these messages when > invoking `dmesg` or can write a patch to do so, it would be greatly > appreciated. I apologize if this was slightly off-topic. Look at man page for dmesg and dmesg --help. dmesg -n <1 to 8> will select the *display level* of messages. 1 is less 8 is more.
This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle. Changing version to '20'. More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20
I see the workaround patch changes here are now in kernel 3.11 (there are a couple patches that fix these noise makers: one is in 3.10 other in 3.11 ), so I'm closing this one as Fixed..Please feel free to reopen or open a new issue if you have other problems.
The needinfo flag was set a while ago. Guess it is not really relevant any more.