Bug 1030504 - Kernel update causes wireless (RT5390) to disable
Kernel update causes wireless (RT5390) to disable
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
19
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Stanislaw Gruszka
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-14 10:16 EST by Don Levey
Modified: 2014-06-18 09:40 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-07 05:45:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lsmod, lspci and dmesg output (129.12 KB, text/plain)
2013-11-17 20:14 EST, Don Levey
no flags Details
dmesg output under non-working kernel (root permissions) (89.52 KB, text/plain)
2013-11-18 13:38 EST, Don Levey
no flags Details
dmidecode output under non-working kernel (root permissions) (9.71 KB, text/plain)
2013-11-18 13:39 EST, Don Levey
no flags Details
messages extract of one day (data for expanding log side-problem) (544.67 KB, text/plain)
2013-11-21 17:05 EST, Don Levey
no flags Details
messages extract of one day (data for expanding log side-problem, revised) (370.45 KB, text/plain)
2013-11-24 22:20 EST, Don Levey
no flags Details
0001-asus-nb-wmi-set-wapf-4-for-ASUSTeK-COMPUTER-INC.-X75.patch (1.01 KB, text/plain)
2013-12-02 09:00 EST, Stanislaw Gruszka
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 64941 None None None Never

  None (edit)
Description Don Levey 2013-11-14 10:16:18 EST
Description of problem:
(Fedora 19)

My wife's laptop was running fine on kernel 3.10.11-200:

uname -a:
Linux laptop.example.com 3.10.11-200.fc19.x86_64 #1 SMP Mon Sep 9
13:03:01 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci -s 03:00.0:
03:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe

rfkill list wlan:
1: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: no

I neither had nor needed any additional kernel modules.  However, when I
upgraded kernels:

uname -a:
Linux laptop.example.com 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27
19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci -s 03:00.0:
03:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe


rfkill list wlan:
0: asus-wlan: Wireless LAN
	Soft blocked: no
	Hard blocked: no
1: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: yes

Now I've got "asus-wlan" while my original one is listed as "hard
blocked." The problem is that the "asus-wlan" doesn't show as an
available interface, and I cannot connect.


How reproducible:
It is easily reproduced by booting under any kernel I've found newer than the last reported working kernel (above)
Comment 1 Michele Baldessari 2013-11-17 13:56:00 EST
What modules do you have loaded? 

This seems somewhat similar to:
https://bugzilla.redhat.com/show_bug.cgi?id=1021036

or:
https://bugzilla.redhat.com/show_bug.cgi?id=1028737
Comment 2 Don Levey 2013-11-17 20:14:23 EST
Created attachment 825349 [details]
lsmod, lspci and dmesg output
Comment 3 Don Levey 2013-11-17 20:15:21 EST
I did not *consciously* load any (related) modules, but that doesn't mean it didn't happen anyway.  I've attached output from lsmod, lspci, and dmesg (all one file).
Comment 4 Stanislaw Gruszka 2013-11-18 05:16:50 EST
Don, please also provide dmesg from broken 3.11 kernel.
Comment 5 Stanislaw Gruszka 2013-11-18 05:19:50 EST
Additionally please provide output of dmidecode command (with root permissions).
Comment 6 Don Levey 2013-11-18 13:38:27 EST
Created attachment 825781 [details]
dmesg output under non-working kernel (root permissions)
Comment 7 Don Levey 2013-11-18 13:39:03 EST
Created attachment 825783 [details]
dmidecode output under non-working kernel (root permissions)
Comment 8 Stanislaw Gruszka 2013-11-19 09:53:46 EST
On 3.10 kernel we have:

> [   11.785965] ACPI Error: [IIA0] Namespace lookup failure, AE_ALREADY_EXISTS (20130328/dsfield-211)
> [   11.785971] ACPI Error: Method parse/execution failed [\_SB_.ATKD.WMNB] (Node ffff8801a5cab168), AE_ALREADY_EXISTS (20130328/psparse-537)
> [   11.786111] ACPI: Marking method WMNB as Serialized because of AE_ALREADY_EXISTS error
> [   11.805117] asus-nb-wmi: probe of asus-nb-wmi failed with error -5

Things work because, asus-rfkill is not created due some APCI problems. On updated 3.11 kernel, ACPI problems were fixed and we have new asus rfkill interface. 

Phy 0 hard blocked is result of reading RFKILL bit from PCI register on RT5390 device, it could be bug on rt2x00 driver, but more probable is that ACPI/BIOS set that bit on PCI device as we have also similar problem with Broadcom device in bug 1028737. Hence this issue is most likely either bug on ACPI/BIOS or asus_wmi driver.

Dan, could you check if adding wapf=4 module option helps:

echo 'options asus-nb-wmi wapf=4' > /etc/modprobe.d/asus.conf

(you have to restart system to make new option take effect).

If not, you can blacklist asus_wmi driver, but that will also stop all special keys functionality on your laptop.

echo 'blacklist asus_wmi' > /etc/modprobe.d/asus.conf

(also restart is needed).
Comment 9 Don Levey 2013-11-21 10:01:24 EST
I have tried the first suggestion (adding wapf=4) and it seems to be a qualified success, though I've not yet had the chance to do extensive testing.  I am seeing a possible side-effect: I have twice been blocked from keyboard input - once while at the login screen, and once when trying to enter a password for screen lock.  I need to test further this evening to see if I can discern a better pattern.  Right now I have remote access to the laptop.

I have also noticed that my /var/log/messages file is filling up, hundreds of gigabytes.  While some of my reading has suggested that this is in relation to kernel messages about the wireless, I can't say if it is related to this problem.  Any thoughts/suggestions?

Also, should this come to a reasonable resolution, should I report this back to the kernel Bugzilla?  I had originally reported it there, and they referred me here.  Is this Fedora-specific, or should the wider community know about it?
Comment 10 Stanislaw Gruszka 2013-11-21 10:27:25 EST
(In reply to Don Levey from comment #9)
> I have also noticed that my /var/log/messages file is filling up, hundreds
> of gigabytes.  While some of my reading has suggested that this is in
> relation to kernel messages about the wireless, I can't say if it is related
> to this problem.  Any thoughts/suggestions?
Perhaps rt2x00 driver flood dmesg with messages, if you would show the massages we could said more. 

> Also, should this come to a reasonable resolution, should I report this back
> to the kernel Bugzilla?  I had originally reported it there, and they
> referred me here.  Is this Fedora-specific, or should the wider community
> know about it?
For now let's try to handle issue here. Is possible that all we have to do, is add waps=4 option as default if your type of laptop will be detected. I'll email upstream maintainers (CC: you) if that will be needed.
Comment 11 Don Levey 2013-11-21 16:56:09 EST
I'm having a little difficulty extracting the relevant lines out of a 120 GB file...  However, I'm seeing the following in my logwatch output:

 --------------------- Kernel Begin ------------------------ 

 6907899 Time(s): mei_me 0000:00:16.0: reset: unexpected enumeration response hbm.
 6909252 Time(s): mei_me 0000:00:16.0: reset: wrong host start response
 1 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = 
 13817506 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
 52984 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000000.
 22983 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000001.
 20703 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000002.
 20146 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000003.
 19367 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000004.
 18643 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000005.
 16480 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000006.
 17083 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000007.
 18859 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000008.
 
 ---------------------- Kernel End ------------------------- 

Given the number of messages it is mentioning, this must be at least part of the problem.  What I'm seeing when researching suggests that blacklisting the mei_me module should take care of the problem.  I'll try this, though I don't know what negative effects are likely.
Comment 12 Don Levey 2013-11-21 17:05:08 EST
Created attachment 827462 [details]
messages extract of one day (data for expanding log side-problem)

While it seems quite plausible that the mei_me module is what is inflating my messages log file (and thus out of the scope of this problem report) I am attaching it here to confirm.
Comment 13 Stanislaw Gruszka 2013-11-22 02:50:36 EST
I can only see below things in vast amount on the logs:

>Nov 16 15:54:34 croweflies abrt-server[3495]: Unlocked '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/.lock' (no or corrupted 'time' file)
>Nov 16 15:54:34 croweflies abrt-server[3495]: File '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/time' doesn't contain valid unix time stamp ('')

and

> Nov 16 15:22:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added
> Nov 16 15:32:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added

But logs are from 3.10. kernel. Please attach dmesg from 3.11 kernel with asus-nb-wmi wapf=4 option, so we could see if it cause flood of messages.
Comment 14 Don Levey 2013-11-24 22:20:29 EST
Created attachment 828495 [details]
messages extract of one day (data for expanding log side-problem, revised)

Oops, didn't think of that.  Here is what should be the messages under the latest kernel I've got installed, a 3.11 kernel.
Comment 15 Stanislaw Gruszka 2013-12-02 07:33:28 EST
Except already mentioned errors, there are lot of below messages in the logs.

> Nov 22 08:22:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:24:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:26:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted

It is NetworkManager bug which does not correctly talk via nl80211 interface, not directly related with asus rfkill problems.

I going to create patch that set wapf=4 for you system as default as fix for this bug ...
Comment 16 Stanislaw Gruszka 2013-12-02 09:00:17 EST
Created attachment 831588 [details]
0001-asus-nb-wmi-set-wapf-4-for-ASUSTeK-COMPUTER-INC.-X75.patch

Proposed fix for this bug.

Kernel build with a patch is here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6247211

Please remove /etc/modprobe.d/asus.conf file, install above kernel and test if it fixes the problem.
Comment 17 Stanislaw Gruszka 2014-01-07 05:45:39 EST
Closing due to lack of response ...
Comment 18 Don Levey 2014-06-18 09:40:55 EDT
Kernel update seems to have fixed...

Note You need to log in before you can comment on or make changes to this bug.