Bug 1244350 - [regression] Intel iwlwifi N 6300 randomly dies and fails to load firmware with Linux 4.x, until a complete system poweroff
Summary: [regression] Intel iwlwifi N 6300 randomly dies and fails to load firmware wi...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 23
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-17 21:17 UTC by Jean-François Fortin Tam
Modified: 2016-12-05 15:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-26 16:40:36 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jean-François Fortin Tam 2015-07-17 21:17:32 UTC
According to lspci, on my Thinkpad X201 I have:
02:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)

Since ~2 weeks or so, I started having the Intel wifi randomly dying on my Thinkpad X201, where the connection drops and you can't reconnect to networks (and I think not even scan networks, from gnome-control-center) until you do a complete poweroff of the laptop (hardware radio killswitch doesn't solve the issue).

This is what I see in journalctl when it happens:

jui 17 10:39:15 NetworkManager[789]: <error> [1437143955.095457] [platform/nm-linux-platform.c:2357] link_change(): Netlink error chang
jui 17 10:39:15 NetworkManager[789]: <info>  (wlp2s0): device state change: config -> failed (reason 'config-failed') [50 120 4]
jui 17 10:39:15 NetworkManager[789]: <info>  NetworkManager state is now CONNECTED_LOCAL
jui 17 10:39:15 NetworkManager[789]: <info>  Disabling autoconnect for connection 'le wifi'.
jui 17 10:39:15 NetworkManager[789]: <warn>  (wlp2s0): Activation: failed for connection 'le wifi'
jui 17 10:39:15 NetworkManager[789]: <info>  (wlp2s0): Activation: Stage 2 of 5 (Device Configure) complete.
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Failed to load firmware chunk!
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Could not load the [0] uCode section
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Failed to run INIT ucode: -110
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Unable to initialize device.




On the web I have found:
https://bugzilla.kernel.org/show_bug.cgi?id=95811
which leads to https://bugzilla.kernel.org/show_bug.cgi?id=91171
and this discussion thread: https://lkml.org/lkml/2015/4/22/601

The thing is, unlike other reporters, I am not "moving" the computer when this happens, it just happens out of the blue while I am sitting and working with it.

I never had that issue happen before in my life (I've been using Fedora 15 to 22, and on this particular laptop since Fedora 19 at least, applying stable software updates as they come), and I doubt it's a hardware issue (if it was, why would a poweroff+poweron solve the issue without moving the computer?).

What I suspect is that some changes somewhere in the Linux 4.x series trigger a bug in the driver/firmware, or maybe there were updates to the iwl firmwares in recent times that broke the driver/kernel.

Let me know if there is some more info I can provide to investigate this.

Comment 1 Justin M. Forbes 2015-10-20 19:07:41 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 2 Fedora Kernel Team 2015-11-23 17:08:30 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 3 Jean-François Fortin Tam 2016-04-29 19:48:21 UTC
This still happens. It stopped for a few months and then came back with recent kernel/system updates in Fedora 23.

Comment 4 James (purpleidea) 2016-04-29 21:51:07 UTC
@jeff

jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Failed to load firmware chunk!
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Could not load the [0] uCode section
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Failed to run INIT ucode: -110
jui 17 10:39:15 kernel: iwlwifi 0000:02:00.0: Unable to initialize device.

I had this exact problem, and I think it's due to a hardware problem. I switched the wifi card for a new one (that in and of itself was a hard problem because of stupid Lenovo bios locks) and now everything works fine again...

It's probably not useful to complain on twitter: https://twitter.com/nekohayo/status/726137016061169666 about it being Fedora's fault, but I think there is a really good diagnosis you can do: please install debian or centos or whatever else, and let us know if you get the same issues after a few weeks. If so, then it's likely it might be hardware after all!

Cheers!

Comment 5 Jean-François Fortin Tam 2016-04-29 22:48:01 UTC
Well, the 4.x kernels have been disastrous in this regard, and that's what gets shipped with bleeding edge distros like Fedora. Then, from one point release to another the problem might go away for a while then come back with another kernel release because some code moved around and some sort of race condition started occuring again—so it's complete guesswork whether my laptop will "work" or not from one month to another. Hence my point about thinking about switching to a distro "frozen in time" (ex: Centos) instead of Fedora, especially when my bug reports go almost systematically unanswered.

That said, I have reasons to think this is not a hardware problem: it happens without the machine moving *at all*, without particular system load or without thermal problems, etc. I've seen it happen when I was just sitting in front of it and looking at the screen. I've seen it happen overnight where I would come up to the machine in the morning and its wifi would be dead. All of that since recent linux kernels (I've been using that machine for years). I might be wrong of course, but my geek intuition is usually pretty good when it comes to heisenbugs.

In your case, you "switched the wifi card to a new one", but it sounds like you used a different card/chip? (since you also mention the BIOS complained about the new card)... If so, that changes a million other variables and does not exclude the possibility of a kernel/driver bug, IMHO.

Comment 6 Jean-François Fortin Tam 2016-04-29 22:53:43 UTC
Oh, and the fact that a poweroff/poweron, without changing anything physically (not opening up and messing up with the wifi card, for example) solves the issue is another reason why I'm thinking "this sounds like a software problem".

Comment 7 Jean-François Fortin Tam 2016-04-29 23:02:29 UTC
This is the trace I got from the kernel just now, in dmesg:


[10341.481698] WARNING: CPU: 1 PID: 5264 at net/mac80211/driver-ops.c:39 drv_stop+0xfe/0x110 [mac80211]()
[10341.481700] Modules linked in: ccm qmi_wwan cdc_wdm usbnet mii fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables wacom_w8001 coretemp kvm_intel kvm arc4 iTCO_wdt iTCO_vendor_support iwldvm snd_hda_codec_hdmi irqbypass mac80211 snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel uvcvideo snd_hda_codec iwlwifi videobuf2_vmalloc snd_hda_core videobuf2_memops snd_hwdep snd_seq videobuf2_v4l2 videobuf2_core cfg80211 snd_seq_device qcserial v4l2_common videodev joydev intel_ips usb_wwan i2c_i801 snd_pcm media lpc_ich shpchp snd_timer mei_me mei thinkpad_acpi acpi_cpufreq snd wmi soundcore rfkill tpm_tis tpm nfsd auth_rpcgss
[10341.481759]  nfs_acl lockd grace sunrpc dm_crypt i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit crc32c_intel drm_kms_helper drm e1000e serio_raw ptp pps_core fjes video
[10341.481787] CPU: 1 PID: 5264 Comm: kworker/1:1 Tainted: G        W       4.4.7-300.fc23.x86_64 #1
[10341.481789] Hardware name: LENOVO 2985FJG/2985FJG, BIOS 6QET68WW (1.38 ) 12/01/2011
[10341.481811] Workqueue: events_freezable ieee80211_restart_work [mac80211]
[10341.481814]  0000000000000286 00000000d534482f ffff88003d6bfb78 ffffffff813b5f8e
[10341.481818]  0000000000000000 ffffffffa05fea06 ffff88003d6bfbb0 ffffffff810a4122
[10341.481822]  ffff8800b4cf8700 ffff8800b4cf8700 0000000000000000 ffff8800b4cf8e90
[10341.481826] Call Trace:
[10341.481832]  [<ffffffff813b5f8e>] dump_stack+0x63/0x85
[10341.481837]  [<ffffffff810a4122>] warn_slowpath_common+0x82/0xc0
[10341.481841]  [<ffffffff810a426a>] warn_slowpath_null+0x1a/0x20
[10341.481863]  [<ffffffffa058c8de>] drv_stop+0xfe/0x110 [mac80211]
[10341.481894]  [<ffffffffa05bc4a3>] ieee80211_stop_device+0x43/0x50 [mac80211]
[10341.481923]  [<ffffffffa059fe44>] ieee80211_do_stop+0x4a4/0x7c0 [mac80211]
[10341.481927]  [<ffffffff817a0c5a>] ? _raw_write_unlock_bh+0x1a/0x20
[10341.481931]  [<ffffffff817a0c6e>] ? _raw_spin_unlock_bh+0xe/0x10
[10341.481958]  [<ffffffffa05a017a>] ieee80211_stop+0x1a/0x20 [mac80211]
[10341.481963]  [<ffffffff81683c29>] __dev_close_many+0x99/0x100
[10341.481967]  [<ffffffff81683d17>] dev_close_many+0x87/0x130
[10341.481971]  [<ffffffff81685e15>] dev_close.part.77+0x45/0x70
[10341.481975]  [<ffffffff81685e5a>] dev_close+0x1a/0x20
[10341.481996]  [<ffffffffa0427ab1>] cfg80211_shutdown_all_interfaces+0x41/0xa0 [cfg80211]
[10341.482027]  [<ffffffffa05b9e48>] ieee80211_handle_reconfig_failure+0x98/0xb0 [mac80211]
[10341.482060]  [<ffffffffa05bc698>] ieee80211_reconfig+0x1e8/0xf80 [mac80211]
[10341.482065]  [<ffffffff8179ed82>] ? mutex_lock+0x12/0x30
[10341.482086]  [<ffffffffa05891eb>] ieee80211_restart_work+0x6b/0xa0 [mac80211]
[10341.482091]  [<ffffffff810bc5c6>] process_one_work+0x156/0x430
[10341.482095]  [<ffffffff810bc8ee>] worker_thread+0x4e/0x450
[10341.482100]  [<ffffffff8179c905>] ? __schedule+0x3a5/0xa00
[10341.482104]  [<ffffffff810bc8a0>] ? process_one_work+0x430/0x430
[10341.482108]  [<ffffffff810bc8a0>] ? process_one_work+0x430/0x430
[10341.482111]  [<ffffffff810c2678>] kthread+0xd8/0xf0
[10341.482115]  [<ffffffff810c25a0>] ? kthread_worker_fn+0x160/0x160
[10341.482120]  [<ffffffff817a144f>] ret_from_fork+0x3f/0x70
[10341.482123]  [<ffffffff810c25a0>] ? kthread_worker_fn+0x160/0x160
[10341.482126] ---[ end trace e6ef10d25b91b55a ]---
[10341.507063] cfg80211: World regulatory domain updated:
[10341.507068] cfg80211:  DFS Master region: unset
[10341.507070] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[10341.507073] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[10341.507075] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz, 92000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[10341.507077] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[10341.507080] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[10341.507082] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[10341.507084] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[10341.507086] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[10341.507088] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[10341.510681] cfg80211: Regulatory domain changed to country: CA
[10341.510686] cfg80211:  DFS Master region: FCC
[10341.510688] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[10341.510692] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A)
[10341.510696] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 1700 mBm), (N/A)
[10341.510699] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2400 mBm), (0 s)
[10341.510703] cfg80211:   (5490000 KHz - 5600000 KHz @ 80000 KHz), (N/A, 2400 mBm), (0 s)
[10341.510705] cfg80211:   (5650000 KHz - 5730000 KHz @ 80000 KHz), (N/A, 2400 mBm), (0 s)
[10341.510708] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A)
[11754.312843] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[11754.369480] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[11754.444777] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1
[11760.159207] iwlwifi 0000:02:00.0: Failed to load firmware chunk!
[11760.159215] iwlwifi 0000:02:00.0: Could not load the [0] uCode section
[11762.118840] iwlwifi 0000:02:00.0: Failed to run INIT ucode: -110
[11762.118875] iwlwifi 0000:02:00.0: Unable to initialize device.
[11774.961442] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[11775.018025] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[11775.093576] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1

Comment 8 James (purpleidea) 2016-04-30 02:05:24 UTC
(In reply to Jean-François Fortin Tam from comment #5)
> Hence my point about
> thinking about switching to a distro "frozen in time" (ex: Centos) instead
> of Fedora, especially when my bug reports go almost systematically
> unanswered.

I feel your pain. I really do. I suppose this can mostly be attributed to the fact that Apple and M$ have 99% of the desktop market, leaving few resources for everyone else.

> 
> That said, I have reasons to think this is not a hardware problem: it
> happens without the machine moving *at all*, without particular system load
> or without thermal problems, etc. I've seen it happen when I was just
> sitting in front of it and looking at the screen. I've seen it happen
> overnight where I would come up to the machine in the morning and its wifi
> would be dead. All of that since recent linux kernels (I've been using that
> machine for years). I might be wrong of course, but my geek intuition is
> usually pretty good when it comes to heisenbugs.

I had virtually identical behaviour. I found at least one search result that suggested a hardware problem. I'd be happy if you found this was a bug and a fix, but I've moved on with a new card which is working fine now.

> 
> In your case, you "switched the wifi card to a new one", but it sounds like
> you used a different card/chip? (since you also mention the BIOS complained
> about the new card)... If so, that changes a million other variables and
> does not exclude the possibility of a kernel/driver bug, IMHO.

I actually got a new card from an X220, which didn't work, so then I bought one on ebay from an X201 which did work.

My advice: save yourself the weeks of pain I went through and buy a new card. I paid US$9.98 with shipping to Canada.

Comment 9 Jean-François Fortin Tam 2016-05-03 18:54:17 UTC
Correction, FWIW: it happened to me again now and a soft reboot (instead powering off completely before powering on again) worked, so it sounds like a kernel/driver issue.

Comment 10 Laura Abbott 2016-09-23 19:19:24 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 11 Laura Abbott 2016-10-26 16:40:36 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.