Bug 1030504

Summary: Kernel update causes wireless (RT5390) to disable
Product: [Fedora] Fedora Reporter: Don Levey <bugzilla.fedora>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: bugzilla.fedora, gansalmon, itamar, jonathan, kernel-maint, linville, madhu.chinakonda, michele, mjg59
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-07 10:45:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lsmod, lspci and dmesg output
none
dmesg output under non-working kernel (root permissions)
none
dmidecode output under non-working kernel (root permissions)
none
messages extract of one day (data for expanding log side-problem)
none
messages extract of one day (data for expanding log side-problem, revised)
none
0001-asus-nb-wmi-set-wapf-4-for-ASUSTeK-COMPUTER-INC.-X75.patch none

Description Don Levey 2013-11-14 15:16:18 UTC
Description of problem:
(Fedora 19)

My wife's laptop was running fine on kernel 3.10.11-200:

uname -a:
Linux laptop.example.com 3.10.11-200.fc19.x86_64 #1 SMP Mon Sep 9
13:03:01 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci -s 03:00.0:
03:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe

rfkill list wlan:
1: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: no

I neither had nor needed any additional kernel modules.  However, when I
upgraded kernels:

uname -a:
Linux laptop.example.com 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27
19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci -s 03:00.0:
03:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe


rfkill list wlan:
0: asus-wlan: Wireless LAN
	Soft blocked: no
	Hard blocked: no
1: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: yes

Now I've got "asus-wlan" while my original one is listed as "hard
blocked." The problem is that the "asus-wlan" doesn't show as an
available interface, and I cannot connect.


How reproducible:
It is easily reproduced by booting under any kernel I've found newer than the last reported working kernel (above)

Comment 1 Michele Baldessari 2013-11-17 18:56:00 UTC
What modules do you have loaded? 

This seems somewhat similar to:
https://bugzilla.redhat.com/show_bug.cgi?id=1021036

or:
https://bugzilla.redhat.com/show_bug.cgi?id=1028737

Comment 2 Don Levey 2013-11-18 01:14:23 UTC
Created attachment 825349 [details]
lsmod, lspci and dmesg output

Comment 3 Don Levey 2013-11-18 01:15:21 UTC
I did not *consciously* load any (related) modules, but that doesn't mean it didn't happen anyway.  I've attached output from lsmod, lspci, and dmesg (all one file).

Comment 4 Stanislaw Gruszka 2013-11-18 10:16:50 UTC
Don, please also provide dmesg from broken 3.11 kernel.

Comment 5 Stanislaw Gruszka 2013-11-18 10:19:50 UTC
Additionally please provide output of dmidecode command (with root permissions).

Comment 6 Don Levey 2013-11-18 18:38:27 UTC
Created attachment 825781 [details]
dmesg output under non-working kernel (root permissions)

Comment 7 Don Levey 2013-11-18 18:39:03 UTC
Created attachment 825783 [details]
dmidecode output under non-working kernel (root permissions)

Comment 8 Stanislaw Gruszka 2013-11-19 14:53:46 UTC
On 3.10 kernel we have:

> [   11.785965] ACPI Error: [IIA0] Namespace lookup failure, AE_ALREADY_EXISTS (20130328/dsfield-211)
> [   11.785971] ACPI Error: Method parse/execution failed [\_SB_.ATKD.WMNB] (Node ffff8801a5cab168), AE_ALREADY_EXISTS (20130328/psparse-537)
> [   11.786111] ACPI: Marking method WMNB as Serialized because of AE_ALREADY_EXISTS error
> [   11.805117] asus-nb-wmi: probe of asus-nb-wmi failed with error -5

Things work because, asus-rfkill is not created due some APCI problems. On updated 3.11 kernel, ACPI problems were fixed and we have new asus rfkill interface. 

Phy 0 hard blocked is result of reading RFKILL bit from PCI register on RT5390 device, it could be bug on rt2x00 driver, but more probable is that ACPI/BIOS set that bit on PCI device as we have also similar problem with Broadcom device in bug 1028737. Hence this issue is most likely either bug on ACPI/BIOS or asus_wmi driver.

Dan, could you check if adding wapf=4 module option helps:

echo 'options asus-nb-wmi wapf=4' > /etc/modprobe.d/asus.conf

(you have to restart system to make new option take effect).

If not, you can blacklist asus_wmi driver, but that will also stop all special keys functionality on your laptop.

echo 'blacklist asus_wmi' > /etc/modprobe.d/asus.conf

(also restart is needed).

Comment 9 Don Levey 2013-11-21 15:01:24 UTC
I have tried the first suggestion (adding wapf=4) and it seems to be a qualified success, though I've not yet had the chance to do extensive testing.  I am seeing a possible side-effect: I have twice been blocked from keyboard input - once while at the login screen, and once when trying to enter a password for screen lock.  I need to test further this evening to see if I can discern a better pattern.  Right now I have remote access to the laptop.

I have also noticed that my /var/log/messages file is filling up, hundreds of gigabytes.  While some of my reading has suggested that this is in relation to kernel messages about the wireless, I can't say if it is related to this problem.  Any thoughts/suggestions?

Also, should this come to a reasonable resolution, should I report this back to the kernel Bugzilla?  I had originally reported it there, and they referred me here.  Is this Fedora-specific, or should the wider community know about it?

Comment 10 Stanislaw Gruszka 2013-11-21 15:27:25 UTC
(In reply to Don Levey from comment #9)
> I have also noticed that my /var/log/messages file is filling up, hundreds
> of gigabytes.  While some of my reading has suggested that this is in
> relation to kernel messages about the wireless, I can't say if it is related
> to this problem.  Any thoughts/suggestions?
Perhaps rt2x00 driver flood dmesg with messages, if you would show the massages we could said more. 

> Also, should this come to a reasonable resolution, should I report this back
> to the kernel Bugzilla?  I had originally reported it there, and they
> referred me here.  Is this Fedora-specific, or should the wider community
> know about it?
For now let's try to handle issue here. Is possible that all we have to do, is add waps=4 option as default if your type of laptop will be detected. I'll email upstream maintainers (CC: you) if that will be needed.

Comment 11 Don Levey 2013-11-21 21:56:09 UTC
I'm having a little difficulty extracting the relevant lines out of a 120 GB file...  However, I'm seeing the following in my logwatch output:

 --------------------- Kernel Begin ------------------------ 

 6907899 Time(s): mei_me 0000:00:16.0: reset: unexpected enumeration response hbm.
 6909252 Time(s): mei_me 0000:00:16.0: reset: wrong host start response
 1 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = 
 13817506 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
 52984 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000000.
 22983 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000001.
 20703 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000002.
 20146 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000003.
 19367 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000004.
 18643 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000005.
 16480 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000006.
 17083 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000007.
 18859 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000008.
 
 ---------------------- Kernel End ------------------------- 

Given the number of messages it is mentioning, this must be at least part of the problem.  What I'm seeing when researching suggests that blacklisting the mei_me module should take care of the problem.  I'll try this, though I don't know what negative effects are likely.

Comment 12 Don Levey 2013-11-21 22:05:08 UTC
Created attachment 827462 [details]
messages extract of one day (data for expanding log side-problem)

While it seems quite plausible that the mei_me module is what is inflating my messages log file (and thus out of the scope of this problem report) I am attaching it here to confirm.

Comment 13 Stanislaw Gruszka 2013-11-22 07:50:36 UTC
I can only see below things in vast amount on the logs:

>Nov 16 15:54:34 croweflies abrt-server[3495]: Unlocked '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/.lock' (no or corrupted 'time' file)
>Nov 16 15:54:34 croweflies abrt-server[3495]: File '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/time' doesn't contain valid unix time stamp ('')

and

> Nov 16 15:22:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added
> Nov 16 15:32:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added

But logs are from 3.10. kernel. Please attach dmesg from 3.11 kernel with asus-nb-wmi wapf=4 option, so we could see if it cause flood of messages.

Comment 14 Don Levey 2013-11-25 03:20:29 UTC
Created attachment 828495 [details]
messages extract of one day (data for expanding log side-problem, revised)

Oops, didn't think of that.  Here is what should be the messages under the latest kernel I've got installed, a 3.11 kernel.

Comment 15 Stanislaw Gruszka 2013-12-02 12:33:28 UTC
Except already mentioned errors, there are lot of below messages in the logs.

> Nov 22 08:22:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:24:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:26:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted

It is NetworkManager bug which does not correctly talk via nl80211 interface, not directly related with asus rfkill problems.

I going to create patch that set wapf=4 for you system as default as fix for this bug ...

Comment 16 Stanislaw Gruszka 2013-12-02 14:00:17 UTC
Created attachment 831588 [details]
0001-asus-nb-wmi-set-wapf-4-for-ASUSTeK-COMPUTER-INC.-X75.patch

Proposed fix for this bug.

Kernel build with a patch is here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6247211

Please remove /etc/modprobe.d/asus.conf file, install above kernel and test if it fixes the problem.

Comment 17 Stanislaw Gruszka 2014-01-07 10:45:39 UTC
Closing due to lack of response ...

Comment 18 Don Levey 2014-06-18 13:40:55 UTC
Kernel update seems to have fixed...