Bug 590333 - iwlagn extremely unstable (microcode errors, wrong command queue, etc)
Summary: iwlagn extremely unstable (microcode errors, wrong command queue, etc)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-08 21:50 UTC by Thomas Walker
Modified: 2011-04-14 15:42 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 14:34:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
syslog w/ debug turned on (131.09 KB, application/octet-stream)
2010-05-11 11:47 UTC, Thomas Walker
no flags Details
More syslog (9.46 KB, application/octet-stream)
2010-05-15 19:23 UTC, Thomas Walker
no flags Details
MAC80211 debugging info as well (90.71 KB, application/octet-stream)
2010-05-25 11:57 UTC, Thomas Walker
no flags Details
Crashes as of June 22nd (1.07 MB, text/plain)
2010-06-23 00:00 UTC, Thomas Walker
no flags Details
Get more info (1.92 KB, patch)
2010-06-28 22:15 UTC, reinette chatre
no flags Details | Diff
Using "Get more info" and stock kernel (34.93 KB, application/octet-stream)
2010-07-03 15:32 UTC, Thomas Walker
no flags Details
More crashes with increased debugging (184.68 KB, text/plain)
2010-08-24 21:21 UTC, Thomas Walker
no flags Details

Description Thomas Walker 2010-05-08 21:50:53 UTC
iwlagn has been extremely unstable since cutting over to fc13/rawhide a week back (each of these is followed by a disconnect and, occasionally, a complete failure to reconnect until either the modules are reloaded or the machine is rebooted)

[walkert@tthink ~]$ sudo zgrep "wrong command queue" /var/log/messages* 
/var/log/messages:May  6 20:07:05 tthink kernel: wrong command queue 0, sequence 0x0 readp=160 writep=168
/var/log/messages:May  6 21:45:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=112 writep=120
/var/log/messages:May  6 23:25:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=39 writep=47
/var/log/messages:May  7 07:15:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=116 writep=124
/var/log/messages:May  7 08:01:36 tthink kernel: wrong command queue 0, sequence 0x0 readp=130 writep=138
/var/log/messages:May  8 12:25:35 tthink kernel: wrong command queue 0, sequence 0x0 readp=117 writep=125
/var/log/messages:May  8 12:55:35 tthink kernel: wrong command queue 0, sequence 0x0 readp=191 writep=199
/var/log/messages:May  8 14:15:34 tthink kernel: wrong command queue 0, sequence 0x0 readp=54 writep=62
/var/log/messages:May  8 17:01:23 tthink kernel: wrong command queue 0, sequence 0x0 readp=106 writep=114
/var/log/messages-20100418:Apr 18 11:09:53 tthink kernel: wrong command queue 0, sequence 0x0 readp=79 writep=82
/var/log/messages-20100425:Apr 22 07:28:13 tthink kernel: wrong command queue 0, sequence 0x0 readp=11 writep=17
/var/log/messages-20100502:Apr 25 21:47:15 tthink kernel: wrong command queue 0, sequence 0x0 readp=181 writep=189

(these are spread across 2.6.33.2-41.fc13.x86_64, 2.6.33.2-57.fc13.x86_64, 2.6.33.3-72.fc13.x86_64, 2.6.33.3-79.fc13.x86_64, 2.6.33.3-85.fc13.x86_64)

Connecting to a linksys wrt610n router w/ ddwrt on it.  Almost always the 5ghz bands (a/n) but it occasionally reconnects to 2.4ghz.  Seems to happen mostly under very light usage.

Thinking something microcode related I turned on debugging and it helpfully crashed immediately after associating (log attached).

Perhaps related, perhaps not, I'm also getting a ton of "failed to remove key from hw" messages:

May  2 18:43:53 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  3 23:12:21 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 00:01:19 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 08:36:53 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 08:47:15 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 08:47:18 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 20:41:39 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 20:41:45 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 20:41:54 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 20:55:44 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 20:55:48 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 20:56:49 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 21:00:11 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 21:00:15 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:02:34 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 21:02:40 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 21:03:02 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:03:32 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  4 21:04:16 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:04:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:15:38 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:15:55 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:28:58 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:29:33 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  4 21:29:51 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 20:07:06 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 21:45:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 22:05:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 22:15:43 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 22:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 23:25:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  6 23:29:13 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 06:35:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 06:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 07:15:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 07:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 08:01:37 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 08:06:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  7 08:15:14 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  7 09:16:12 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  7 10:04:24 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  8 11:50:32 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  8 11:55:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 12:25:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 12:55:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 14:15:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 14:25:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 14:52:25 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22)
May  8 14:55:34 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 17:01:23 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)
May  8 17:04:17 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22)


Rather reproducible (although may need to wait an hour or two) so I'm more than willing to collect more data if you need.

Comment 1 Thomas Walker 2010-05-08 21:53:57 UTC
Correction (obviously) this goes back more than a week, wrong machine :)  This one I converted a few weeks ago and iwlagn has been unstable since day one.

Comment 2 John W. Linville 2010-05-10 18:01:02 UTC
Please attach the output of 'lspci -n'.  Also, a shot in the dark -- is the appropriate firmware package up-to-date?

Comment 3 Thomas Walker 2010-05-10 22:03:40 UTC
(In reply to comment #2)
> Please attach the output of 'lspci -n'.  Also, a shot in the dark -- is the
> appropriate firmware package up-to-date?    

[walkert@tthink ~]$ lspci -n
00:00.0 0600: 8086:27a0 (rev 03)
00:02.0 0300: 8086:27a2 (rev 03)
00:02.1 0380: 8086:27a6 (rev 03)
00:1b.0 0403: 8086:27d8 (rev 02)
00:1c.0 0604: 8086:27d0 (rev 02)
00:1c.1 0604: 8086:27d2 (rev 02)
00:1c.2 0604: 8086:27d4 (rev 02)
00:1c.3 0604: 8086:27d6 (rev 02)
00:1d.0 0c03: 8086:27c8 (rev 02)
00:1d.1 0c03: 8086:27c9 (rev 02)
00:1d.2 0c03: 8086:27ca (rev 02)
00:1d.3 0c03: 8086:27cb (rev 02)
00:1d.7 0c03: 8086:27cc (rev 02)
00:1e.0 0604: 8086:2448 (rev e2)
00:1f.0 0601: 8086:27b9 (rev 02)
00:1f.1 0101: 8086:27df (rev 02)
00:1f.2 0106: 8086:27c5 (rev 02)
00:1f.3 0c05: 8086:27da (rev 02)
02:00.0 0200: 8086:109a
03:00.0 0280: 8086:4229 (rev 61)
15:00.0 0607: 1180:0476 (rev b4)
15:00.1 0c00: 1180:0552 (rev 09)
15:00.2 0805: 1180:0822 (rev 18)
[walkert@tthink ~]$ rpm -qa | grep iwl4965-firmware
iwl4965-firmware-228.61.2.24-2.fc12.noarch
[walkert@tthink ~]$ rpm -ql iwl4965-firmware
/lib/firmware/iwlwifi-4965-1.ucode
/lib/firmware/iwlwifi-4965-2.ucode
/usr/share/doc/iwl4965-firmware-228.61.2.24
/usr/share/doc/iwl4965-firmware-228.61.2.24/LICENSE
/usr/share/doc/iwl4965-firmware-228.61.2.24/README
[walkert@tthink ~]$ rpm -qV iwl4965-firmware
[walkert@tthink ~]$ 

At first the fc12 in the package version looks funny but I did go check the repo and that is indeed what it says.

Comment 4 reinette chatre 2010-05-10 23:37:18 UTC
(In reply to comment #0)
> 
> (these are spread across 2.6.33.2-41.fc13.x86_64, 2.6.33.2-57.fc13.x86_64,
> 2.6.33.3-72.fc13.x86_64, 2.6.33.3-79.fc13.x86_64, 2.6.33.3-85.fc13.x86_64)

Could somebody please remind me where to get these sources?

> Thinking something microcode related I turned on debugging and it helpfully
> crashed immediately after associating (log attached).

I cannot find the attachment. Could you please point me to it?

> 
> May  2 18:43:53 tthink kernel: mac80211-phy0: failed to remove key (0,
> 00:22:6b:4f:f4:0d) from hardware (-22)
> May  3 23:12:21 tthink kernel: mac80211-phy0: failed to remove key (0,
> 00:22:6b:4f:f4:0c) from hardware (-22)

This is typically not a problem since mac80211 asks driver to remove key as it is going down and driver does not send commands to device while it is going down.

Comment 5 Thomas Walker 2010-05-11 11:45:56 UTC
Sorry, thought I attached the log.  Here you go.

Comment 6 Thomas Walker 2010-05-11 11:47:05 UTC
Created attachment 413114 [details]
syslog w/ debug turned on

Comment 7 John W. Linville 2010-05-11 14:37:36 UTC
For Fedora kernel sources, you can start here:

http://koji.fedoraproject.org/koji/packageinfo?packageID=8

Pick the specific version of interest and navigate to find the src rpm.  If you need guidance on dealing with those, I can provide that too.

Comment 8 Thomas Walker 2010-05-15 19:21:42 UTC
Adding a couple more crashes in case it is helpful.

Comment 9 Thomas Walker 2010-05-15 19:23:10 UTC
Created attachment 414281 [details]
More syslog

Comment 10 reinette chatre 2010-05-18 20:58:52 UTC
Thomas,

Are you perhaps using fragmentation? Does this happen when you run a particular application (which may be using fragmentation)?

Comment 11 Thomas Walker 2010-05-19 12:01:38 UTC
nothing I can think of... some basic web browsing, ssh, nfs (v4 over tcp).  I was expecting a counter or something in netstat -s but no luck.  Any way to tell for sure without a packet capture when it happens?

Comment 12 reinette chatre 2010-05-19 21:09:16 UTC
It seems that you can reproduce this easily. Could you please add some flags to debugging that will capture more data? Please note that this will generate a lot of data so if you were using dmesg to capture the log, please set up things to use syslog instead (route all kernel messages, kern.*, to a file). 

Please reproduce with debug flag of 0x843fff

In addition to this, please compile your mac80211 module with CONFIG_MAC80211_VERBOSE_DEBUG

Thanks!

Comment 13 Thomas Walker 2010-05-25 11:57:16 UTC
Created attachment 416370 [details]
MAC80211 debugging info as well

Comment 14 Thomas Walker 2010-05-25 11:58:06 UTC
Something about the additional delay added by all that debugging makes this happen with much reduced frequency.  Got one last night while the machine was completely idle though (attached).
The other odd thing I've noticed is that the card seems to hit these a lot immediately after associating (like within the same second).

Comment 15 reinette chatre 2010-05-26 05:31:43 UTC
This is very strange. If I read the log correctly the issue is in the scanning code. What seems to happen is a normal scan is requested and for some reason does not complete (notification containing results for last channel does not arrive). To deal with this the driver send an "abort scan" command to the device, at which point you get the firmware sysassert. The strange thing is that the sysassert shows no log of receipt of the scan abort command and actually shows the error to be somewhere in a TX command. This is hinting towards some memory corruption.

Since the FC specific patches did modify the scanning code, could you please check if this issue is present in vanilla 2.6.33.3 also?

Comment 16 John W. Linville 2010-05-26 14:25:05 UTC
Thomas, I don't know if you are in a position to build/use kernels from upstream sources, so I put together a test kernel that removes all the iwlwifi patches from an otherwise stock F-13 kernel:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2210924

Testing that isn't really conclusive (i.e. some other Fedora patch could be at fault), but it eliminates the iwlwifi patches without destabilizing anything else your system might rely upon.  And, it is probably a lot easier for you to install. :-)

Comment 17 John W. Linville 2010-05-26 15:55:13 UTC
Grrr...that build failed, for no particular reason -- sometimes I hate Koji...

I think this one will make it...

http://koji.fedoraproject.org/koji/taskinfo?taskID=2211020

Comment 18 Thomas Walker 2010-05-26 18:12:07 UTC
No worries, I do this stuff for a living.  I'll take it if it goes through or else just build my own later today.

Comment 19 Thomas Walker 2010-05-28 14:38:58 UTC
With the koji kernel from #17:

May 27 19:04:04 tthink kernel: WARNING: at drivers/net/wireless/iwlwifi/iwl-tx.c:1239 iwl_tx_cmd_complete+0x8d/0x30c [iwlcore]()
May 27 19:04:04 tthink kernel: Hardware name: 1709CTO
May 27 19:04:04 tthink kernel: wrong command queue 0, sequence 0x0 readp=6 writep=12
May 27 19:04:04 tthink kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss fuse ipt_MASQUERADE iptable_nat nf_nat rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel arc4 snd_hda_codec ecb snd_hwdep snd_seq iwlagn snd_seq_device iwlcore snd_pcm mac80211 snd_timer
May 27 19:04:04 tthink kernel: iwlagn 0000:03:00.0: Error setting new RXON (-5)
May 27 19:04:04 tthink kernel: thinkpad_acpi sdhci_pci sdhci btusb snd mmc_core cfg80211 irda soundcore snd_page_alloc iTCO_wdt bluetooth e1000e iTCO_vendor_support i2c_i801 crc_ccitt rfkill microcode sha256_generic aes_x86_64 aes_generic cbc dm_crypt firewire_ohci firewire_core crc_itu_t yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
May 27 19:04:04 tthink kernel: Pid: 1402, comm: NetworkManager Not tainted 2.6.33.4-107.bz590333.fc13.x86_64 #1
May 27 19:04:04 tthink kernel: Call Trace:
May 27 19:04:04 tthink kernel: <IRQ>  [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f
May 27 19:04:04 tthink kernel: [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e
May 27 19:04:04 tthink kernel: [<ffffffffa02a7b21>] ? iwl_add_sta_callback+0x67/0xe7 [iwlcore]
May 27 19:04:04 tthink kernel: [<ffffffffa02a4712>] iwl_tx_cmd_complete+0x8d/0x30c [iwlcore]
May 27 19:04:04 tthink kernel: [<ffffffffa02e5aac>] iwl_rx_handle+0x36e/0x4a6 [iwlagn]
May 27 19:04:04 tthink kernel: [<ffffffffa02e77ee>] iwl_irq_tasklet_legacy+0x4fc/0x75d [iwlagn]
May 27 19:04:04 tthink kernel: [<ffffffffa029b8ee>] ? __iwl_read32.clone.5+0xab/0xba [iwlcore]
May 27 19:04:04 tthink kernel: [<ffffffff810502ad>] tasklet_action+0x80/0xdf
May 27 19:04:04 tthink kernel: [<ffffffff810516c5>] __do_softirq+0xe0/0x1a1
May 27 19:04:04 tthink kernel: [<ffffffff8109a0de>] ? handle_IRQ_event+0x5b/0x11c
May 27 19:04:04 tthink kernel: [<ffffffff8100aa1c>] call_softirq+0x1c/0x30
May 27 19:04:04 tthink kernel: [<ffffffff8100c21d>] do_softirq+0x41/0x7e
May 27 19:04:04 tthink kernel: [<ffffffff81051518>] irq_exit+0x36/0x78
May 27 19:04:04 tthink kernel: [<ffffffff8100b957>] do_IRQ+0xa7/0xbe
May 27 19:04:04 tthink kernel: [<ffffffff8142b2d3>] ret_from_intr+0x0/0x11
May 27 19:04:04 tthink kernel: <EOI>  [<ffffffff811ceb3a>] ? avtab_search_node+0x47/0x7b
May 27 19:04:04 tthink kernel: [<ffffffff811d5c77>] cond_compute_av+0x21/0x86
May 27 19:04:04 tthink kernel: [<ffffffff811d5131>] context_struct_compute_av+0x18c/0x27e
May 27 19:04:04 tthink kernel: [<ffffffff8104658f>] ? try_to_wake_up+0x2c4/0x2d6
May 27 19:04:04 tthink kernel: [<ffffffff811d52b6>] security_compute_av_core+0x93/0xc1
May 27 19:04:04 tthink kernel: [<ffffffff811d5424>] security_compute_av+0xc7/0x20e
May 27 19:04:04 tthink kernel: [<ffffffff811c32ab>] avc_has_perm_noaudit+0x10b/0x39c
May 27 19:04:04 tthink kernel: [<ffffffff8142a1a8>] ? mutex_lock+0x24/0x4b
May 27 19:04:04 tthink kernel: [<ffffffff8110ed53>] ? might_fault+0x1c/0x1e
May 27 19:04:04 tthink kernel: [<ffffffff81400de9>] ? unix_poll+0x19/0x95
May 27 19:04:04 tthink kernel: [<ffffffff811c3572>] avc_has_perm+0x36/0x69
May 27 19:04:04 tthink kernel: [<ffffffff811c461a>] socket_has_perm+0x6e/0x84
May 27 19:04:04 tthink kernel: [<ffffffff811c46d6>] selinux_socket_sendmsg+0x1a/0x1c
May 27 19:04:04 tthink kernel: [<ffffffff811be6a9>] security_socket_sendmsg+0x11/0x13
May 27 19:04:04 tthink kernel: [<ffffffff8137a126>] __sock_sendmsg+0x3e/0x64
May 27 19:04:04 tthink kernel: [<ffffffff8137a43e>] sock_sendmsg+0xa3/0xbc
May 27 19:04:04 tthink kernel: [<ffffffff810fbab6>] ? try_get_mem_cgroup_from_mm+0x34/0x44
May 27 19:04:04 tthink kernel: [<ffffffff810fcaaa>] ? __mem_cgroup_try_charge+0x49/0x22b
May 27 19:04:04 tthink kernel: [<ffffffff8137b653>] ? __sys_recvmsg+0x18e/0x241
May 27 19:04:04 tthink kernel: [<ffffffff8137a4bf>] ? sockfd_lookup_light+0x1b/0x53
May 27 19:04:04 tthink kernel: [<ffffffff8137bfef>] sys_sendto+0x10b/0x14d
May 27 19:04:04 tthink kernel: [<ffffffff811c750e>] ? selinux_file_permission+0xa2/0xae
May 27 19:04:04 tthink kernel: [<ffffffff811be0ef>] ? security_file_permission+0x11/0x13
May 27 19:04:04 tthink kernel: [<ffffffff8142dc05>] ? do_page_fault+0x2bd/0x2ed
May 27 19:04:04 tthink kernel: [<ffffffff81096337>] ? audit_syscall_entry+0x119/0x145
May 27 19:04:04 tthink kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
May 27 19:04:04 tthink kernel: ---[ end trace 60828a819afe8124 ]---
May 27 19:04:04 tthink kernel: iwl data: 00000000: 20 00 00 00 00 00 00 00 01 00 00 00 1f 80 00 00   ...............
May 27 19:04:04 tthink kernel: iwl data: 00000010: 00 00 00 00 01 00 00 00 ea 9b 00 00 87 01 00 00  ................


Do you want me to turn back on the verbose debug (seems to make it take longer to happen but obviously more useful output)?  MAC80211 as well?

Comment 20 Paul Lange 2010-05-30 16:27:11 UTC
Don't know if this is the same issue but iwlagn is also unstable for me.

lcspi:
lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 03)
01:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9600M GT] (rev a1)
02:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
06:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host Controller
06:00.1 System peripheral: JMicron Technology Corp. SD/MMC Host Controller
06:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller
06:00.3 System peripheral: JMicron Technology Corp. MS Host Controller
06:00.4 System peripheral: JMicron Technology Corp. xD Host Controller


from dmesg:
iwlagn 0000:02:00.0: Microcode SW error detected.  Restarting 0x2000000.
iwlagn 0000:02:00.0: Start IWL Error Log Dump:
iwlagn 0000:02:00.0: Status: 0x000212E4, count: 5
iwlagn 0000:02:00.0: Desc                               Time       data1      data2      line
iwlagn 0000:02:00.0: SYSASSERT                    (#05) 2079665153 0x4180803D 0x00008907 1364
iwlagn 0000:02:00.0: blink1  blink2  ilink1  ilink2
iwlagn 0000:02:00.0: 0x039B8 0x039B8 0x008B2 0x00000
iwlagn 0000:02:00.0: Start IWL Event Log Dump: display last 20 entries
iwlagn 0000:02:00.0: EVT_LOGT:0000546992:0x01002118:0211
iwlagn 0000:02:00.0: EVT_LOGT:0000546998:0x00000000:0212
iwlagn 0000:02:00.0: EVT_LOGT:0000547054:0x00000000:0215
iwlagn 0000:02:00.0: EVT_LOGT:0000547058:0x00000008:0220
iwlagn 0000:02:00.0: EVT_LOGT:0000547416:0x020f001c:0206
iwlagn 0000:02:00.0: EVT_LOGT:0000547418:0x00000001:0204
iwlagn 0000:02:00.0: EVT_LOGT:0000547421:0x00000001:0214
iwlagn 0000:02:00.0: EVT_LOGT:0000547422:0x01002118:0209
iwlagn 0000:02:00.0: EVT_LOGT:0000547567:0x00000000:0210
iwlagn 0000:02:00.0: EVT_LOGT:0000547578:0x00000000:0207
iwlagn 0000:02:00.0: EVT_LOGT:0000547580:0x01002118:0211
iwlagn 0000:02:00.0: EVT_LOGT:0000547586:0x00000000:0212
iwlagn 0000:02:00.0: EVT_LOGT:0000547654:0x00000000:0215
iwlagn 0000:02:00.0: EVT_LOGT:0000547658:0x00000008:0220
iwlagn 0000:02:00.0: EVT_LOGT:0000547675:0x00000000:0302
iwlagn 0000:02:00.0: EVT_LOGT:0000547704:0x000000d4:0303
iwlagn 0000:02:00.0: EVT_LOGT:0000547709:0x00001010:0217
iwlagn 0000:02:00.0: EVT_LOGT:0000547709:0x020f001c:0217
iwlagn 0000:02:00.0: EVT_LOGT:0000547752:0x0431004e:0401
iwlagn 0000:02:00.0: EVT_LOGT:0000547764:0x00000000:0125
iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:fe:45:a4:07 tid = 0

Comment 21 Thomas Walker 2010-06-20 23:42:48 UTC
Any more ideas here?  Removing the Fedora patches definitely made the crashes a bit less frequent (as does turning on successive layers of debugging).  Not exactly a solution though.  I have noticed that a number of the errors seem to occur either when the interface is coming up or going down (one happened during a suspend).  Still do have some that appear out of the blue (machine doing nothing or very light work) though...

Anyway, let me know if there is anything else I can provide.

Comment 22 reinette chatre 2010-06-22 22:34:08 UTC
Thomas, are you saying that after moving to the Fedora kernel without the iwlwifi patches the firmware sysassert is not showing up anymore? 

Is the warning ("May 27 19:04:04 tthink kernel: WARNING: at
drivers/net/wireless/iwlwifi/iwl-tx.c:1239 iwl_tx_cmd_complete+0x8d/0x30c
[iwlcore]()") the only issue you are seeing right now?

Do you see the warning only on 5GHz?

Comment 23 Thomas Walker 2010-06-22 22:49:47 UTC
It still shows up but less frequently... I can't tell whether it is exactly the same error every time but it certainty  looks similar.  The timing of adding the debug output makes it even less likely.  Not a real solution by any means.

Any other ideas of anything to try?

Comment 24 reinette chatre 2010-06-22 23:03:11 UTC
Could you please send a log of the error you are seeing now? Debug should not be needed at this point to confirm that it is the same issue as before.

Comment 25 Thomas Walker 2010-06-23 00:00:08 UTC
[walkert@tthink ~]$ sudo zgrep -A800 'Microcode SW error' messages* > /tmp/crashes-0622

All of the crashes in recent history that I have (the last of which was today, with debugging on and the fedora kernel minus iwl patches).  FWIW- SSID "XXXXXXXXXX" is 5ghz and "XXXXXXXXXX-alt" is 2.4ghz.

Comment 26 Thomas Walker 2010-06-23 00:00:55 UTC
Created attachment 426116 [details]
Crashes as of June 22nd

Comment 27 Thomas Walker 2010-06-23 00:02:28 UTC
Excuse me, last was the 17th because I've had the very verbose debugging turned on since then (today's crash was actually rhythymbox, for once not the kernel).

Comment 28 reinette chatre 2010-06-28 22:15:15 UTC
Created attachment 427520 [details]
Get more info

Since debugging makes this harder to reproduce, could you please try with this patch to help get some more data? Ideally ... having this patch _and_ debugging will help a lot, but if you cannot reproduce with debugging, please try this patch on its own.

Thank you

Comment 29 Thomas Walker 2010-07-03 15:30:33 UTC
I applied your extra patch to the current stock FC13 kernel (reverting back to including the FC iwl patches now).  Got a crash with debugging enabled last night... Wasn't sure how much context to give you around it in syslog so let me know if you need anything more.

Comment 30 Thomas Walker 2010-07-03 15:32:05 UTC
Created attachment 429268 [details]
Using "Get more info" and stock kernel

Comment 31 reinette chatre 2010-07-07 17:08:07 UTC
Unfortunately debug was not enabled during this encounter ... and to add to this this encounter was for a different issue that the patch I provided is not relevant for. In this run you encountered an NMI error where previously it was a SYSASSERT.

Comment 32 Thomas Walker 2010-07-09 01:02:21 UTC
My mistake, I thought I had re-enabled it.  Even the stack traces you added seemed to make it much less common.  Debugging (0x43fff) re-enabled.

Comment 33 John W. Linville 2010-08-10 17:58:54 UTC
Any more information available here?

Comment 34 Thomas Walker 2010-08-10 22:43:06 UTC
For whatever reason, these problems have tailed off to the point where I've only had about 1 a week since early July (this is without debug even enabled so I should have been seeing them with some frequency).  Been running 2.6.33.6-147 or 2.6.33.6-147.2.4 during that time (unpatched).

Comment 35 John W. Linville 2010-08-24 17:56:03 UTC
OK, please reopen if/when this again becomes enough of a problem to collect the requested information...thanks!

Comment 36 Thomas Walker 2010-08-24 21:21:46 UTC
Created attachment 440775 [details]
More crashes with increased debugging

MAC80211 debug, getmoreinfo patch, and iwlagn debug=0x43fff.

Comment 37 Nicolae Ghimbovschi 2010-10-06 19:01:32 UTC
I'm experiencing the same problem with the latest updates on Fedora 13 x86_64

03:00.0 Network controller: Intel Corporation WiFi Link 5100
	Subsystem: Intel Corporation WiFi Link 5100 AGN
	Flags: bus master, fast devsel, latency 0, IRQ 35
	Memory at fcffe000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: iwlagn
	Kernel modules: iwlagn

Linux jupiter64 2.6.34.7-56.fc13.x86_64 #1 SMP Wed Sep 15 03:36:55 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Before the latest two kernel updates it worked much better.

Comment 38 Paul Lange 2010-10-07 11:53:01 UTC
The latest kernel updates made it much worse for me too. If you need any logs please tell me how I can get them. So far I can only report the crashs via abrt.

Comment 39 Thomas Walker 2010-10-30 14:13:43 UTC
I, on the other hand, have virtually seen this problem go away with more recent kernels (FC13 and then FC14 alpha/beta/GA).  About a month ago I got lazy and stopped rebuilding mac80211 with the debug patches and noticed that I didn't see the number of crashes increase (as had often happened when I decreased debug output).  About 3 weeks ago, I completely removed the iwlagn modprobe debug options which should have returned me to multiple crashes a day.  Not had anything in that period...

Not sure what changed but 2.6.33+ has been pretty rock solid for me.

Comment 40 Nicolae Ghimbovschi 2010-11-01 17:59:46 UTC
I've also noticed improvements in stability.
But is not rock solid here, I had two crashes.

Once I was able to make it work without restarting the laptop, 
by switching off/on the wifi switch.

The other time it didn't work, I had to restart my laptop.

Comment 41 Christopher 2011-01-05 20:30:50 UTC
I'm seeing exactly the same "Microcode SW Error Detected" problem messages from iwlagn in dmesg as described in #20 under the newest Fedora 14 x86_64 kernel.  Machine is a Lenovo W500 Thinkpad.

[christopher@aries ~]$ uname -a
Linux aries 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux


[christopher@aries ~]$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
00:03.3 Serial controller: Intel Corporation Mobile 4 Series Chipset AMT SOL Redirection (rev 07)
00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network Connection (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M-E LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3650
03:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
15:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba)
15:00.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 04)
15:00.2 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 21)
15:00.4 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 11)
15:00.5 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 11)

Comment 42 John W. Linville 2011-01-13 19:03:42 UTC
After the Microcode SW Error, does the device recover?  Or does it require a manual modprobe or reboot?

Comment 43 John W. Linville 2011-02-16 14:34:48 UTC
Closing based on comment 39 and comment 40 and lack of response to comment 42...

Comment 44 Christopher 2011-04-14 15:42:39 UTC
Bug shouldn't be closed, as it isn't even close to being resolved.

Current kernel of my machine (#41) has been updated to:
Linux aries 2.6.35.12-88.fc14.x86_64 #1 SMP Thu Mar 31 21:21:57 UTC 2011 x86_64
x86_64 x86_64 GNU/Linux

My dmesg log continues to fill up with these error events, with a new entry set every couple of minutes whenever I'm connected to any wireless AP.

The device recovers on its own...however, it makes sustained network communication impossible.  I first discovered the problem because the machine could not maintain a usable Skype connection, exhibiting bad quality and frequent momentary call "freezes" and outright disconnects.


Note You need to log in before you can comment on or make changes to this bug.