Bug 820005

Summary: wifi shuts off intermittently, as if switched off in hardware
Product: [Fedora] Fedora Reporter: J. Bruce Fields <bfields>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: bfields, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, nada81, pkrul, vgaikwad
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-08 17:12:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages
none
acpi trace
none
thinkpad_acpi_disable_wlsw_param.patch none

Description J. Bruce Fields 2012-05-08 20:49:41 UTC
Created attachment 583071 [details]
/var/log/messages

Periodically the wireless goes out.  Network manager says it's turned off in hardware; dmesg shows "RF_KILL bit toggled to disable radio."

Sometimes it comes back up on its own.  Sometimes a "rmmod iwlwifi; modprobe iwlwifi" seems to help.

The laptop is a Thinkpad x201, and lspci lists the wireless as
"Intel Corporation Centrino Ultimate-N 6300 (rev 35)".

Apologies for being somewhat vague.  Wireless has been unreliable in general on this laptop since about Fedora 16.  I install Fedora 17 just this weekend, and will add more if I notice clearer patterns.

In the most recent case, wireless went out, I tried rmmod/modprobe, and also tried toggling the hardware wireless switch, to no avail.  I brought up ethernet instead, and the machine froze soon after.

Attaching /var/log/messages, covering that incident as well as the current boot (during which wireless went down and came back up at least once).

Comment 1 J. Bruce Fields 2012-05-08 20:53:57 UTC
Note also, from the attached /var/log/messages, a list-debugging warning that I overlooked at first triggered before that last freeze.  I don' tknow if it's related to the wireless problem or not:

May  8 15:42:33 pad kernel: [69806.342292] ------------[ cut here ]------------
May  8 15:42:33 pad kernel: [69806.342301] WARNING: at lib/list_debug.c:30 __list_add+0x8e/0x90()
May  8 15:42:33 pad kernel: [69806.342304] Hardware name: 3680B45
May  8 15:42:33 pad kernel: [69806.342305] list_add corruption. prev->next should be next (ffff8802301c1580), but was           (null). (prev=ffff88022bc24e78).
May  8 15:42:33 pad kernel: [69806.342307] Modules linked in: iwlwifi tun fuse ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat iptable_mangle rfcomm nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bnep snd_hda_codec_hdmi snd_hda_codec_conexant arc4 microcode btusb bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core i2c_i801 videodev media iTCO_wdt intel_ips iTCO_vendor_support mac80211 snd_hda_intel snd_hda_codec snd_hwdep cfg80211 snd_pcm snd_page_alloc snd_timer e1000e thinkpad_acpi snd soundcore rfkill uinput wmi i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: iwlwifi]
May  8 15:42:33 pad kernel: [69806.342357] Pid: 566, comm: dbus-daemon Not tainted 3.3.4-4.fc17.x86_64 #1
May  8 15:42:33 pad kernel: [69806.342359] Call Trace:
May  8 15:42:33 pad kernel: [69806.342361]  <IRQ>  [<ffffffff810568af>] warn_slowpath_common+0x7f/0xc0
May  8 15:42:33 pad kernel: [69806.342373]  [<ffffffff810569a6>] warn_slowpath_fmt+0x46/0x50
May  8 15:42:33 pad kernel: [69806.342379]  [<ffffffff815eb939>] ? _raw_write_unlock_bh+0x19/0x20
May  8 15:42:33 pad kernel: [69806.342382]  [<ffffffff812d105e>] __list_add+0x8e/0x90
May  8 15:42:33 pad kernel: [69806.342387]  [<ffffffff81064d23>] internal_add_timer+0x113/0x130
May  8 15:42:33 pad kernel: [69806.342391]  [<ffffffff810668e1>] mod_timer+0x151/0x2a0
May  8 15:42:33 pad kernel: [69806.342397]  [<ffffffff815925c5>] fib6_force_start_gc+0x35/0x40
May  8 15:42:33 pad kernel: [69806.342399]  [<ffffffff8158e87b>] icmp6_dst_alloc+0x16b/0x2e0
May  8 15:42:33 pad kernel: [69806.342403]  [<ffffffff815a0f73>] mld_sendpack+0x163/0x2e0
May  8 15:42:33 pad kernel: [69806.342406]  [<ffffffff815a150f>] ? add_grec+0x41f/0x4b0
May  8 15:42:33 pad kernel: [69806.342409]  [<ffffffff815a258c>] mld_ifc_timer_expire+0x18c/0x280
May  8 15:42:33 pad kernel: [69806.342412]  [<ffffffff815a2400>] ? igmp6_mcf_seq_start+0x140/0x140
May  8 15:42:33 pad kernel: [69806.342416]  [<ffffffff81066011>] run_timer_softirq+0x141/0x340
May  8 15:42:33 pad kernel: [69806.342419]  [<ffffffff8105dc90>] __do_softirq+0xc0/0x1e0
May  8 15:42:33 pad kernel: [69806.342424]  [<ffffffff8101aba3>] ? native_sched_clock+0x13/0x80
May  8 15:42:33 pad kernel: [69806.342429]  [<ffffffff815f4e1c>] call_softirq+0x1c/0x30
May  8 15:42:33 pad kernel: [69806.342434]  [<ffffffff81015465>] do_softirq+0x75/0xb0
May  8 15:42:33 pad kernel: [69806.342437]  [<ffffffff8105e055>] irq_exit+0xb5/0xc0
May  8 15:42:33 pad kernel: [69806.342440]  [<ffffffff815f576e>] smp_apic_timer_interrupt+0x6e/0x99
May  8 15:42:33 pad kernel: [69806.342443]  [<ffffffff815f441e>] apic_timer_interrupt+0x6e/0x80
May  8 15:42:33 pad kernel: [69806.342445]  <EOI>  [<ffffffff815f3a90>] ? sysret_audit+0x17/0x21
May  8 15:42:33 pad kernel: [69806.342450] ---[ end trace 2e7173430c364b75 ]---

Comment 2 J. Bruce Fields 2012-05-11 21:11:44 UTC
Also: occasionally the interface enters a state where I'm periodically asked for the WEP key, and this continues indefinitely without any succesful connection.  In the most recent case, 'rmmod iwlwifi && modprobe iwlwifi' fixed the problem.

Apologies if this is actually a symptom of an entirely different bug.

Comment 3 Josh Boyer 2012-07-10 17:10:28 UTC
iwlwifi has been kind of broken for a few kernel releases now.  Do you still have this issue with the 3.4.4-5 kernel update?

Comment 4 J. Bruce Fields 2012-07-10 18:29:24 UTC
I haven't seen the same oops again.

I'm still seeing other problems (periodically it stops working and network manager shows it as disabled in hardware).  I'm still on 3.4.4-3.  I can upgrade to 3.4.4-5 now and report.  Or we can close this bug as unreproduceable and I could open another with the current symptoms if they persist, if that would be useful.

Comment 6 Tibbs Brookside 2012-10-16 18:21:46 UTC
i can confirm that i've had this problem (not the oops) for some time on my hp pavilion dv8. currently running 3.6.1-1.fc17.x86_64. wireless device is an Intel 5100 AGN.

Comment 7 Fedora End Of Life 2013-07-04 03:22:43 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 J. Bruce Fields 2013-07-04 19:54:50 UTC
The sproradic "RF_KILL bit toggled to disable radio" behavior still occurs with Fedora 18.

I should try a BIOS update as well, I suppose.

Comment 9 J. Bruce Fields 2013-08-13 20:25:15 UTC
I'm still seeing the sporadic RF_KILL bit toggles now on F19, and I've updated to the latest BIOS:

uname -a: 
  Linux pad.fieldses.org 3.10.5-201.fc19.x86_64 #1 SMP Wed Aug 7 16:25:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
dmidecode -s bios-version:
  6QET70WW (1.40 )
dmidecode -t 11:
  # dmidecode 2.12
  SMBIOS 2.6 present.
  
  Handle 0x0027, DMI type 11, 5 bytes
  OEM Strings
  	String 1: IBM ThinkPad Embedded Controller -[6QHT34WW-1.15    ]-

Comment 10 Justin M. Forbes 2013-10-18 20:57:31 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 11 J. Bruce Fields 2013-10-18 21:06:03 UTC
Still seeing the same symptoms on F19.

Comment 12 Commandant Perno 2013-11-18 10:04:55 UTC
i have xps dell 16 / with wifi intel 5100 and i have the same problem.

kernel : 3.11.8-200

Comment 13 Stanislaw Gruszka 2013-11-20 14:34:10 UTC
RF_KILL events seen by iwlwifi pci driver are generated by the hardware or firmware. The only possibility of kernel bug is that we do not talk well with firmware via ACPI and that somehow generate event. Let try to check that ...

J. Bruce, please do the following:

mount -t debugfs debugfs /sys/kernel/debug/
/sys/kernel/debug/tracing/
echo function_graph > current_tracer
echo ":mod:thinkpad_acpi" > set_ftrace_filter

# may want to flush old trace buffers
cat trace_pipe 

# start tracing
echo 1 > tracing_on
cat trace_pipe | tee ~/trace.log

You can check if tracing work using some laptop hotkey.

Then wait problem to occur and provide trace.log

Those who has Dell laptop can try this kernel (currently compiling):
http://koji.fedoraproject.org/koji/taskinfo?taskID=6205526
Does it fix problem for you ?

Comment 14 J. Bruce Fields 2013-11-20 18:18:39 UTC
(In reply to Stanislaw Gruszka from comment #13)
> J. Bruce, please do the following:

Thanks!  I have the trace running but unfortunately (?) I'm not seeing the problem today.  I'll try some more tonight on a different network and see if that makes a difference.

> You can check if tracing work using some laptop hotkey.

I'm not sure what you're suggesting here--flipping the hardware "airline mode" switch to verify whether that produces trace events?

Comment 15 Stanislaw Gruszka 2013-11-21 08:33:14 UTC
(In reply to J. Bruce Fields from comment #14)
> I'm not sure what you're suggesting here--flipping the hardware "airline
> mode" switch to verify whether that produces trace events?

Yes.

Comment 16 J. Bruce Fields 2013-11-28 20:57:08 UTC
Created attachment 830385 [details]
acpi trace

Apologies for the delay, I was having trouble reproducing for a while but it seems to be happening more often where I am now.

When I looked up and saw that network manager had labeled the interface "hardware disabled", I believe the trace was at about line 237, so the problem happened a little before that.  Then I believe the interface came back up before I stopped the trace.

Comment 17 Vishal Gaikwad 2013-11-29 05:41:29 UTC
Kernel - 3.11.9-200.fc19.x86_64 - this issue occurs every 5-10 mins. I moved to 3.11.8-200.fc19.x86_64 and it reduced to a longer period say 1hr or so. Tried looking at the changelogs but didn't find any significant changes committed. I can give it a try to collect the trace and put it here.

Comment 18 Stanislaw Gruszka 2013-12-05 09:29:44 UTC
What I can tell from the trace is that RFKILL event came from the BIOS . I'm not sure if and how kernel can influence firmware to generate that event.

I found old upstream bug report about that, but without any conclusion:
http://marc.info/?t=130902706300003&r=1&w=2

I can provide patch, which will add thinkpad_acpi module parameter, that will make BIOS RFKILL event be ignored. Other than this, I can not really do much else. For real fix, if such is possible at all, bug should be propagated upstream to ibm-acpi-devel.net with the following info (according to 
Documentation/laptops/thinkpad-acpi.txt)

        - ThinkPad model name
        - a copy of your ACPI tables, using the "acpidump" utility
        - a copy of the output of dmidecode, with serial numbers
          and UUIDs masked off

Comment 19 Stanislaw Gruszka 2013-12-05 09:42:10 UTC
Created attachment 833062 [details]
thinkpad_acpi_disable_wlsw_param.patch

Add module parameter which allow to ignore BIOS RFKILL events. It can be used like on below example:

echo "options thinkpad_acpi disable_wlsw=1" > /etc/modprobe.d/thinkpad_acpi.conf

Note that patch was not tested, only compiled by me.

Comment 20 J. Bruce Fields 2013-12-10 16:49:27 UTC
Thanks, I've cc'd sgruszka on a report to ibm-acpi-devel.net (hope that's OK).

I'm also running around with your patch, and see no ill effects (though I also haven't yet been running long enough to be positive whether it's functioning as a workaround).

Comment 21 J. Bruce Fields 2013-12-13 17:42:02 UTC
I don't know what the expected behavior is when running with your workaround.  I do still see temporary drops, with journal output like:

Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
Dec 13 12:28:15 pad.fieldses.org kernel: wlp2s0: deauthenticating from 00:12:17:05:cc:5c by local choice (reason=3)
Dec 13 12:28:15 pad.fieldses.org NetworkManager[409]: <warn> Connection disconnected (reason -3)
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL
Dec 13 12:28:15 pad.fieldses.org kernel: thinkpad_acpi: unhandled HKEY event 0x7000
Dec 13 12:28:15 pad.fieldses.org kernel: thinkpad_acpi: please report the conditions when this event happened to ibm-acpi-devel.net
Dec 13 12:28:15 pad.fieldses.org NetworkManager[409]: <info> WiFi now disabled by radio killswitch
Dec 13 12:28:15 pad.fieldses.org NetworkManager[409]: <info> (wlp2s0): device state change: activated -> unavailable (reason 'none') [100 20 0]
Dec 13 12:28:15 pad.fieldses.org NetworkManager[409]: <info> (wlp2s0): deactivating device (reason 'none') [0]
Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending command - RF KILL

Comment 22 Justin M. Forbes 2014-01-03 22:03:42 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 23 Stanislaw Gruszka 2014-01-07 10:43:46 UTC
This is clearly not fixed.

Comment 24 Stanislaw Gruszka 2014-01-07 11:03:22 UTC
(In reply to J. Bruce Fields from comment #21)
> I don't know what the expected behavior is when running with your
> workaround.  I do still see temporary drops, with journal output like:

Not quite.

> Dec 13 12:28:15 pad.fieldses.org kernel: iwlwifi 0000:02:00.0: Not sending
> command - RF KILL
> Dec 13 12:28:15 pad.fieldses.org kernel: thinkpad_acpi: unhandled HKEY event
> 0x7000

thinkpad_acpi driver ignore 0x7000 event (wireless switch RFKILL), that works as intended by patch. But unfortunately iwlwifi device (pci H/W) still see the event. That's odd, seems workaround will not prevent to switching off wireless device randomly.

Not sure upstream developers will provide any fix for that, probably not, this really looks like hardware problem, random generation of rfkill signal.

Comment 25 J. Bruce Fields 2014-01-07 17:18:45 UTC
If it's a hardware problem then I'm curious why it didn't seem to happen for the first few months I owned the laptop.

I'll try to find an opportunity to work for a while from F14-ish live usb and see if the problem was reproduceable there.

If it's hardware that also suggests Windows users might see the same problem.  Googling for "thinkpad x201 wifi problems" does turn up some with similar-sound symptoms.

Many suggest a power-management problem:

  http://superuser.com/questions/523666/lenovo-thinkpad-x201-wireless-switching-off

The only linux knob that looks comparable is the iwlwifi.power_save option which is already off.

A few users report problems with the physical switch (e.g. that it slides out of position on its own).  Hard to know what to make of those reports.

Comment 26 Stanislaw Gruszka 2014-01-08 15:19:08 UTC
(In reply to J. Bruce Fields from comment #25)
> If it's a hardware problem then I'm curious why it didn't seem to happen for
> the first few months I owned the laptop.

Good point. I think hardware can start to broke after some time due to mechanical or thermal influence, but thats poor explanation.

> If it's hardware that also suggests Windows users might see the same
> problem.  Googling for "thinkpad x201 wifi problems" does turn up some with
> similar-sound symptoms.
> 
> Many suggest a power-management problem:
> 
>  
> http://superuser.com/questions/523666/lenovo-thinkpad-x201-wireless-
> switching-off
> 
> The only linux knob that looks comparable is the iwlwifi.power_save option
> which is already off.
> 
> A few users report problems with the physical switch (e.g. that it slides
> out of position on its own).  Hard to know what to make of those reports.

Windows options seems to be our power save. But perhaps this is something different, maybe somehow BIOS control radio output and trigger RFKILL signal where there is no traffic. But that would be very strange. Anyway perhaps there is BIOS option for that ?

Comment 27 J. Bruce Fields 2014-01-08 16:46:20 UTC
(In reply to Stanislaw Gruszka from comment #26)
> Windows options seems to be our power save. But perhaps this is something
> different, maybe somehow BIOS control radio output and trigger RFKILL signal
> where there is no traffic. But that would be very strange. Anyway perhaps
> there is BIOS option for that ?

The only BIOS options that looked possibly relevant were options to turn off PCI and PCI Express power management.  (I believe the wireless adapter is PCI Express).  I did try turning both off but was still able to reproduce the problem.

Comment 28 J. Bruce Fields 2014-01-20 15:18:40 UTC
(In reply to J. Bruce Fields from comment #25)
> The only linux knob that looks comparable is the iwlwifi.power_save option
> which is already off.

Just for kicks I tried turning iwlwifi.power_save *on*.  ('echo "options iwlwifi power_save=Y" >/etc/modprobe.d/private-iwlwifi-hack.conf' and reboot).

I've been running with that setting for about two weeks, using wireless all the time, on several different networks.  I've noticed only a couple brief disconnections in that time.  There are still some "RF_KILL bit toggled to disable radio"'s in the logs, always with a corresponding "enable" message within a few seconds.  Overall the wireless is much more reliable.

So that's kind of an ambiguous result, but perhaps enough to suggest a connection with power management.  Seemed worth at least noting here in case it's useful as a workaround to anyone else.

Comment 30 J. Bruce Fields 2020-01-08 17:12:02 UTC
It's unclear whether any reporters still have the hardware (I don't), and after a fair amount of investigation we're still not even sure whether this is a Fedora bug or a hardware bug.  Maybe this isn't worth keeping open any longer.