Bug 1030504
Summary: | Kernel update causes wireless (RT5390) to disable | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Don Levey <bugzilla.fedora> |
Component: | kernel | Assignee: | Stanislaw Gruszka <sgruszka> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 19 | CC: | bugzilla.fedora, gansalmon, itamar, jonathan, kernel-maint, linville, madhu.chinakonda, michele, mjg59 |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-01-07 10:45:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Don Levey
2013-11-14 15:16:18 UTC
What modules do you have loaded? This seems somewhat similar to: https://bugzilla.redhat.com/show_bug.cgi?id=1021036 or: https://bugzilla.redhat.com/show_bug.cgi?id=1028737 Created attachment 825349 [details]
lsmod, lspci and dmesg output
I did not *consciously* load any (related) modules, but that doesn't mean it didn't happen anyway. I've attached output from lsmod, lspci, and dmesg (all one file). Don, please also provide dmesg from broken 3.11 kernel. Additionally please provide output of dmidecode command (with root permissions). Created attachment 825781 [details]
dmesg output under non-working kernel (root permissions)
Created attachment 825783 [details]
dmidecode output under non-working kernel (root permissions)
On 3.10 kernel we have: > [ 11.785965] ACPI Error: [IIA0] Namespace lookup failure, AE_ALREADY_EXISTS (20130328/dsfield-211) > [ 11.785971] ACPI Error: Method parse/execution failed [\_SB_.ATKD.WMNB] (Node ffff8801a5cab168), AE_ALREADY_EXISTS (20130328/psparse-537) > [ 11.786111] ACPI: Marking method WMNB as Serialized because of AE_ALREADY_EXISTS error > [ 11.805117] asus-nb-wmi: probe of asus-nb-wmi failed with error -5 Things work because, asus-rfkill is not created due some APCI problems. On updated 3.11 kernel, ACPI problems were fixed and we have new asus rfkill interface. Phy 0 hard blocked is result of reading RFKILL bit from PCI register on RT5390 device, it could be bug on rt2x00 driver, but more probable is that ACPI/BIOS set that bit on PCI device as we have also similar problem with Broadcom device in bug 1028737. Hence this issue is most likely either bug on ACPI/BIOS or asus_wmi driver. Dan, could you check if adding wapf=4 module option helps: echo 'options asus-nb-wmi wapf=4' > /etc/modprobe.d/asus.conf (you have to restart system to make new option take effect). If not, you can blacklist asus_wmi driver, but that will also stop all special keys functionality on your laptop. echo 'blacklist asus_wmi' > /etc/modprobe.d/asus.conf (also restart is needed). I have tried the first suggestion (adding wapf=4) and it seems to be a qualified success, though I've not yet had the chance to do extensive testing. I am seeing a possible side-effect: I have twice been blocked from keyboard input - once while at the login screen, and once when trying to enter a password for screen lock. I need to test further this evening to see if I can discern a better pattern. Right now I have remote access to the laptop. I have also noticed that my /var/log/messages file is filling up, hundreds of gigabytes. While some of my reading has suggested that this is in relation to kernel messages about the wireless, I can't say if it is related to this problem. Any thoughts/suggestions? Also, should this come to a reasonable resolution, should I report this back to the kernel Bugzilla? I had originally reported it there, and they referred me here. Is this Fedora-specific, or should the wider community know about it? (In reply to Don Levey from comment #9) > I have also noticed that my /var/log/messages file is filling up, hundreds > of gigabytes. While some of my reading has suggested that this is in > relation to kernel messages about the wireless, I can't say if it is related > to this problem. Any thoughts/suggestions? Perhaps rt2x00 driver flood dmesg with messages, if you would show the massages we could said more. > Also, should this come to a reasonable resolution, should I report this back > to the kernel Bugzilla? I had originally reported it there, and they > referred me here. Is this Fedora-specific, or should the wider community > know about it? For now let's try to handle issue here. Is possible that all we have to do, is add waps=4 option as default if your type of laptop will be detected. I'll email upstream maintainers (CC: you) if that will be needed. I'm having a little difficulty extracting the relevant lines out of a 120 GB file... However, I'm seeing the following in my logwatch output: --------------------- Kernel Begin ------------------------ 6907899 Time(s): mei_me 0000:00:16.0: reset: unexpected enumeration response hbm. 6909252 Time(s): mei_me 0000:00:16.0: reset: wrong host start response 1 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = 13817506 Time(s): mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING 52984 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000000. 22983 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000001. 20703 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000002. 20146 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000003. 19367 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000004. 18643 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000005. 16480 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000006. 17083 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000007. 18859 Time(s): mei_me 0000:00:16.0: we can't read the message slots =00000008. ---------------------- Kernel End ------------------------- Given the number of messages it is mentioning, this must be at least part of the problem. What I'm seeing when researching suggests that blacklisting the mei_me module should take care of the problem. I'll try this, though I don't know what negative effects are likely. Created attachment 827462 [details]
messages extract of one day (data for expanding log side-problem)
While it seems quite plausible that the mei_me module is what is inflating my messages log file (and thus out of the scope of this problem report) I am attaching it here to confirm.
I can only see below things in vast amount on the logs: >Nov 16 15:54:34 croweflies abrt-server[3495]: Unlocked '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/.lock' (no or corrupted 'time' file) >Nov 16 15:54:34 croweflies abrt-server[3495]: File '/var/tmp/abrt/ccpp-2013-10-30-21:46:27-4608/time' doesn't contain valid unix time stamp ('') and > Nov 16 15:22:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added > Nov 16 15:32:00 croweflies gnome-session[1014]: JS LOG: Removing an access point that was never added But logs are from 3.10. kernel. Please attach dmesg from 3.11 kernel with asus-nb-wmi wapf=4 option, so we could see if it cause flood of messages. Created attachment 828495 [details]
messages extract of one day (data for expanding log side-problem, revised)
Oops, didn't think of that. Here is what should be the messages under the latest kernel I've got installed, a 3.11 kernel.
Except already mentioned errors, there are lot of below messages in the logs.
> Nov 22 08:22:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:24:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
> Nov 22 08:26:11 croweflies NetworkManager[522]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
It is NetworkManager bug which does not correctly talk via nl80211 interface, not directly related with asus rfkill problems.
I going to create patch that set wapf=4 for you system as default as fix for this bug ...
Created attachment 831588 [details] 0001-asus-nb-wmi-set-wapf-4-for-ASUSTeK-COMPUTER-INC.-X75.patch Proposed fix for this bug. Kernel build with a patch is here: http://koji.fedoraproject.org/koji/taskinfo?taskID=6247211 Please remove /etc/modprobe.d/asus.conf file, install above kernel and test if it fixes the problem. Closing due to lack of response ... Kernel update seems to have fixed... |