Bug 1658178

Summary: brcmfmac based Wifi stops working several times a day
Product: [Fedora] Fedora Reporter: Cajus Pollmeier <cajus>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, artem.silenkov, bskeggs, ewk, hcamp, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, luzpaz, mchehab, mjg59, paulegan, saturns_rings, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-17 20:06:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full dmesg from boot to wifi fail none

Description Cajus Pollmeier 2018-12-11 13:05:56 UTC
Description of problem:

With the current Fedora Kernel (see uname output), I've the issue that the Wifi connection stalls after some time. I've to reboot or re-insert the brcmfmac module manually to get the network back running. This behaviour is new, I didn't experience it before. I can't say if it's really kernel related (i.e. driver bug) or caused by something else.

My log shows:

Dez 11 13:00:28 statler kernel: brcmfmac: brcmf_msgbuf_tx_ioctl: Failed to reserve space in commonring
Dez 11 13:00:28 statler kernel: brcmfmac: brcmf_run_escan: error (-12)
Dez 11 13:00:28 statler kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-12)


Version-Release number of selected component (if applicable):

$ uname -a
Linux statler 4.19.6-300.fc29.x86_64 #1 SMP Sun Dec 2 17:33:14 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ lspci -vv -d 14e4:43a3
3a:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4350 802.11ac Wireless Network Adapter (rev 08)
	Subsystem: Dell Device 0021
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 148
	Region 0: Memory at dc400000 (64-bit, non-prefetchable) [size=32K]
	Region 2: Memory at dc000000 (64-bit, non-prefetchable) [size=4M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=2 PME-
	Capabilities: [58] MSI: Enable+ Count=1/16 Maskable- 64bit+
		Address: 00000000fee00598  Data: 0000
	Capabilities: [68] Vendor Specific Information: Len=44 <?>
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <32us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via WAKE#
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [13c v1] Device Serial Number 00-00-cb-ff-ff-83-30-52
	Capabilities: [150 v1] Power Budgeting <?>
	Capabilities: [160 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [1b0 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [220 v1] Resizable BAR <?>
	Capabilities: [240 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=0us PortTPowerOnTime=50us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=50us
	Kernel driver in use: brcmfmac
	Kernel modules: brcmfmac


How reproducible:

No idea. Switch on my Dell XPS and work the whole day with Wifi. It happens a couple of times a day.

If I can provide any information / try things, please let me know.

Comment 1 Artem Silenkov 2018-12-12 15:26:16 UTC
This bug affects me too. Hardware is exactly the same. Dell XPS 13 9350

We have two places to fail here, linux-firmware and kernel. Tried different version combinations upgraded\downgraded.

Last working kernel for me is 4.18.16-300.fc29.x86_64 #1 SMP Sat Oct 20 23:24:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No issues with this kernel with any linux-firmware package. 

Kernels greater than this one break my wifi on regular basis.

```[14553.667734] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14553.667742] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -5
[14555.715751] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14557.763741] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14557.763748] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -5
[14559.811660] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14561.859709] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14561.859716] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -5
[14563.908605] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14565.955611] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14565.955623] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -5
[14568.003577] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14570.051608] brcmfmac: brcmf_msgbuf_query_dcmd: Timeout on response for query command
[14570.051614] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -5
```
Workaround is to stick to 4.18.16-300.fc29.x86_64 kernel.

Comment 2 Hans de Goede 2018-12-13 13:33:03 UTC
(In reply to Artem Silenkov from comment #1)
> Workaround is to stick to 4.18.16-300.fc29.x86_64 kernel.

So this first breaks with 4.18.17?  :
https://koji.fedoraproject.org/koji/buildinfo?buildID=1160572

Or did you not test that one ?

If this broke between 4.18.16 and 4.18.17 it should be relatively easy
to figure out what broke it.

Comment 3 Artem Silenkov 2018-12-13 22:28:23 UTC
Nope didn't test this one.

/ ❯❯❯ dnf list installed "kernel-core"
Installed Packages
kernel-core.x86_64                                                                     4.18.16-300.fc29                                                                      @anaconda
kernel-core.x86_64                                                                     4.19.7-300.fc29                                                                       @updates
kernel-core.x86_64                                                                     4.19.8-300.fc29                                                                       @updates

Fedora installer ships 4.18.16-300 <- good one
And updates repo goes directly to 4.19.7-300 <- broken.

The problem is this thing is hard to reproduce. It is just happening randomly even when laptop is completely idle.

Will do testing with 4.18.17 tomorrow.

Comment 4 Artem Silenkov 2018-12-14 08:51:59 UTC
(In reply to Hans de Goede from comment #2)
> (In reply to Artem Silenkov from comment #1)

> So this first breaks with 4.18.17?  :
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1160572
> 
> Or did you not test that one ?
> 
> If this broke between 4.18.16 and 4.18.17 it should be relatively easy
> to figure out what broke it.

Kernel v4.18.17 works for me, no wifi issues. Further switching to 4.19.7 triggers errors. 

I'm trying to bisect kernel sources to figure out offending commit but there are so many changes I feel lost already.

Comment 5 Artem Silenkov 2018-12-15 16:45:52 UTC
Created attachment 1514631 [details]
Full dmesg from boot to wifi fail

kernel 4.19.8-300.fc29.x86_64
full dmesg from fresh boot to wifi fail.

Comment 6 Artem Silenkov 2018-12-15 17:08:49 UTC
Comment on attachment 1514631 [details]
Full dmesg from boot to wifi fail

Crosspost from kernel.org bugzilla 
https://bugzilla.kernel.org/show_bug.cgi?id=201853

Comment 7 Artem Silenkov 2018-12-22 14:38:02 UTC
New update kernel=4.19.10-300.fc29.x86_64 is still bugged. 

Sadly I need fresh kernel for development tasks and this bug breaks core functionality for me. 

 iw dev wlp58s0 set power_save off

Could do the workaround but not good for battery which is terrible when travelling.

Comment 8 Hans de Goede 2019-01-11 15:19:09 UTC
Thank you for the bug report, I've send an email to the upstream brcmfmac maintainers asking them for help with this.

Comment 9 Justin M. Forbes 2019-01-29 16:25:08 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.

Fedora 28 has now been rebased to 4.20.5-100.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.

If you experience different issues, please open a new bug report for those.

Comment 10 Ian 2019-02-06 22:12:07 UTC
Still present under F29 kernel 4.20.5-200.fc29.x86_64.

Comment 11 Cajus Pollmeier 2019-02-07 07:39:49 UTC
Same here.

Comment 12 Justin M. Forbes 2019-08-20 17:41:59 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 5.2.9-100.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.

If you experience different issues, please open a new bug report for those.

Comment 13 Justin M. Forbes 2019-09-17 20:06:17 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 14 Red Hat Bugzilla 2023-09-14 04:43:37 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days