Bug 1505058 - Wifi breaks on Surface Pro 3 by mwifiex_pcie
Summary: Wifi breaks on Surface Pro 3 by mwifiex_pcie
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-21 15:34 UTC by redhat
Modified: 2018-07-25 18:57 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-25 18:57:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl log entries for a connection try when it's broken. (2.81 KB, text/plain)
2017-10-21 15:34 UTC, redhat
no flags Details
dmesg output from mwifiex_pcie (2.40 KB, text/plain)
2018-03-16 10:54 UTC, Alan
no flags Details

Description redhat 2017-10-21 15:34:45 UTC
Created attachment 1341603 [details]
journalctl log entries for a connection try when it's broken.

Description of problem:
Wifi network connections break or don't work anymore after some kind of status change.

Version-Release number of selected component (if applicable):
Linux Surface 4.13.5-200.fc26.x86_64 #1 SMP Thu Oct 5 16:53:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
It happens always within 24 hours of network connection. Sometimes directly after a reboot, sometimes less often.

It happens easier during connection which roams and changes the connected AP more recently

Steps to Reproduce:
1. Use a seruface Pro 3 and connect to a wifi network
2. Wait for a while
3. Connection stops working by itself after a while and won't reconnect again

Actual results:
The connection stops working and won't connect anymore.

Expected results:
The network connection stays working or at least reconnects.

Additional info:
It's possible to reconnect the network after reloading the kernel module using `sudo modprobe -r mwifiex_pcie && sudo modprobe mwifiex_pcie`. But sometimes even this doesn't work and causes the need to reboot.

Comment 1 Alan 2017-11-06 19:42:09 UTC
The problem detailed in the log included here is probably due to NetworkManger mac address randomization.  You can try to add the following to a file in /etc/NetworkManager/conf.d

[device-mac-randomization]
# "yes" is the default for scanning
wifi.scan-rand-mac-address=no

[connection-mac-randomization]
## "random" is the default for both
ethernet.cloned-mac-address=permanent
wifi.cloned-mac-address=permanent

Comment 2 Alan 2017-11-06 19:45:47 UTC
that being said, there is also a problem with power management in the mwifiex_pcie module.  I see the following in dmesg:

[32954.401211] mwifiex_pcie 0000:01:00.0: Firmware wakeup failed
[32954.402947] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[32954.403022] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed

[33760.085919] mwifiex_pcie 0000:01:00.0: WLAN FW already running! Skip FW dnld
[33760.085924] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[33760.204043] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p77) 

whenever the card is woken from sleep.  I've tried all the different f26 kernels and the newest firmware from mrvl.

The problem entirely goes away if I issue a:
sudo iwconfig wlp1s0 power off

turning off power management on the wifi adapter.  Unfortunately this also means not being able to put the device to sleep so it is only a workaround
-alan

Comment 3 Andy Shevchenko 2018-01-01 11:55:18 UTC
Marvell seems do not care about Linux much, so, best recommendation is highly avoid anything Marvell produces. Vote by foot and money!

P.S. There is a new discussion in ML has been started, perhaps we may also create a PR wave WRT issue: https://www.spinics.net/lists/linux-wireless/msg168245.html

Comment 4 Laura Abbott 2018-02-28 03:36:35 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale. The kernel moves very fast so bugs may get fixed as part of a kernel update. Due to this, we are doing a mass bug update across all of the Fedora 26 kernel bugs.
 
Fedora 26 has now been rebased to 4.15.4-200.fc26.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 27, and are still experiencing this issue, please change the version to Fedora 27.
 
If you experience different issues, please open a new bug report for those.

Comment 5 Alan 2018-02-28 22:36:54 UTC
I can confirm it's still present in F27 using kernel 4.15.3-300.fc27.x86_64

Comment 6 redhat 2018-03-12 13:23:09 UTC
(In reply to Laura Abbott from comment #4)
> We apologize for the inconvenience.  There is a large number of bugs to go
> through and several of them have gone stale. The kernel moves very fast so
> bugs may get fixed as part of a kernel update. Due to this, we are doing a
> mass bug update across all of the Fedora 26 kernel bugs.
>  
> Fedora 26 has now been rebased to 4.15.4-200.fc26.  Please test this kernel
> update (or newer) and let us know if you issue has been resolved or if it is
> still present with the newer kernel.
>  
> If you have moved on to Fedora 27, and are still experiencing this issue,
> please change the version to Fedora 27.
>  
> If you experience different issues, please open a new bug report for those.

No problem. I currently can't test it as my Surface's ssd died which makes the device pretty useless. But I migrated it to F27 a while ago so I'll update the bug report.

Alan seems to be in the position do test fixes.

Comment 7 Alan 2018-03-16 10:54:57 UTC
Created attachment 1408715 [details]
dmesg output from mwifiex_pcie

dmesg showing complete cycle of mwifi turning on, connecting, entering power save, then failing to wake up

Comment 8 Alan 2018-03-16 11:01:14 UTC
I attached my latest dmesg running Fedora 27 with kernel 4.15.8-300.fc27.x86_64

this leads to a NetworkManager failure:
NetworkManager[789]: <info>  [1521172459.7414] device (wlp1s0): state change: activated -> unmanaged (reason 'removed', internal state 'removed')

and wi-fi becomes unusable after that.

Comment 9 Andy Shevchenko 2018-07-23 12:01:16 UTC
The behaviour is not changed for a while.
The workaround is to disable power management for the Wi-Fi interface and don't allow system suspend.

I'm pretty sure it's a bug (or even bugs) in the firmware of Marvell products.

Comment 10 Justin M. Forbes 2018-07-23 14:55:49 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 11 Alan 2018-07-25 07:11:23 UTC
I upgraded to Fedora 28 and the kernel 4.17 series with mwifiex 1.0 (15.68.7.p154) and noticed an improvement right away.  Now on 4.17.7 I have yet to see a problem at all....firmware has been sleeping and wifi has been coming back to life pretty reliably.
I did experience a bug where the touchscreen module need to be removed and re-inserted after sleep but that seems totally unrelated.

-alan

Comment 12 Justin M. Forbes 2018-07-25 18:57:37 UTC
Thanks for the update, closing this for now, feel free to reopen if you notice it come back.


Note You need to log in before you can comment on or make changes to this bug.