iwlagn has been extremely unstable since cutting over to fc13/rawhide a week back (each of these is followed by a disconnect and, occasionally, a complete failure to reconnect until either the modules are reloaded or the machine is rebooted) [walkert@tthink ~]$ sudo zgrep "wrong command queue" /var/log/messages* /var/log/messages:May 6 20:07:05 tthink kernel: wrong command queue 0, sequence 0x0 readp=160 writep=168 /var/log/messages:May 6 21:45:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=112 writep=120 /var/log/messages:May 6 23:25:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=39 writep=47 /var/log/messages:May 7 07:15:42 tthink kernel: wrong command queue 0, sequence 0x0 readp=116 writep=124 /var/log/messages:May 7 08:01:36 tthink kernel: wrong command queue 0, sequence 0x0 readp=130 writep=138 /var/log/messages:May 8 12:25:35 tthink kernel: wrong command queue 0, sequence 0x0 readp=117 writep=125 /var/log/messages:May 8 12:55:35 tthink kernel: wrong command queue 0, sequence 0x0 readp=191 writep=199 /var/log/messages:May 8 14:15:34 tthink kernel: wrong command queue 0, sequence 0x0 readp=54 writep=62 /var/log/messages:May 8 17:01:23 tthink kernel: wrong command queue 0, sequence 0x0 readp=106 writep=114 /var/log/messages-20100418:Apr 18 11:09:53 tthink kernel: wrong command queue 0, sequence 0x0 readp=79 writep=82 /var/log/messages-20100425:Apr 22 07:28:13 tthink kernel: wrong command queue 0, sequence 0x0 readp=11 writep=17 /var/log/messages-20100502:Apr 25 21:47:15 tthink kernel: wrong command queue 0, sequence 0x0 readp=181 writep=189 (these are spread across 2.6.33.2-41.fc13.x86_64, 2.6.33.2-57.fc13.x86_64, 2.6.33.3-72.fc13.x86_64, 2.6.33.3-79.fc13.x86_64, 2.6.33.3-85.fc13.x86_64) Connecting to a linksys wrt610n router w/ ddwrt on it. Almost always the 5ghz bands (a/n) but it occasionally reconnects to 2.4ghz. Seems to happen mostly under very light usage. Thinking something microcode related I turned on debugging and it helpfully crashed immediately after associating (log attached). Perhaps related, perhaps not, I'm also getting a ton of "failed to remove key from hw" messages: May 2 18:43:53 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 3 23:12:21 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 00:01:19 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 08:36:53 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 08:47:15 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 08:47:18 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 20:41:39 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 20:41:45 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 20:41:54 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 20:55:44 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 20:55:48 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 20:56:49 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 21:00:11 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 21:00:15 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:02:34 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 21:02:40 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 21:03:02 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:03:32 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 4 21:04:16 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:04:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:15:38 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:15:55 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:28:58 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:29:33 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 4 21:29:51 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 20:07:06 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 21:45:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 22:05:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 22:15:43 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 22:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 23:25:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 6 23:29:13 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 06:35:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 06:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 07:15:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 07:55:42 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 08:01:37 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 08:06:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 7 08:15:14 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 7 09:16:12 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 7 10:04:24 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 8 11:50:32 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 8 11:55:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 12:25:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 12:55:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 14:15:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 14:25:35 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 14:52:25 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0c) from hardware (-22) May 8 14:55:34 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 17:01:23 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) May 8 17:04:17 tthink kernel: mac80211-phy0: failed to remove key (0, 00:22:6b:4f:f4:0d) from hardware (-22) Rather reproducible (although may need to wait an hour or two) so I'm more than willing to collect more data if you need.
Correction (obviously) this goes back more than a week, wrong machine :) This one I converted a few weeks ago and iwlagn has been unstable since day one.
Please attach the output of 'lspci -n'. Also, a shot in the dark -- is the appropriate firmware package up-to-date?
(In reply to comment #2) > Please attach the output of 'lspci -n'. Also, a shot in the dark -- is the > appropriate firmware package up-to-date? [walkert@tthink ~]$ lspci -n 00:00.0 0600: 8086:27a0 (rev 03) 00:02.0 0300: 8086:27a2 (rev 03) 00:02.1 0380: 8086:27a6 (rev 03) 00:1b.0 0403: 8086:27d8 (rev 02) 00:1c.0 0604: 8086:27d0 (rev 02) 00:1c.1 0604: 8086:27d2 (rev 02) 00:1c.2 0604: 8086:27d4 (rev 02) 00:1c.3 0604: 8086:27d6 (rev 02) 00:1d.0 0c03: 8086:27c8 (rev 02) 00:1d.1 0c03: 8086:27c9 (rev 02) 00:1d.2 0c03: 8086:27ca (rev 02) 00:1d.3 0c03: 8086:27cb (rev 02) 00:1d.7 0c03: 8086:27cc (rev 02) 00:1e.0 0604: 8086:2448 (rev e2) 00:1f.0 0601: 8086:27b9 (rev 02) 00:1f.1 0101: 8086:27df (rev 02) 00:1f.2 0106: 8086:27c5 (rev 02) 00:1f.3 0c05: 8086:27da (rev 02) 02:00.0 0200: 8086:109a 03:00.0 0280: 8086:4229 (rev 61) 15:00.0 0607: 1180:0476 (rev b4) 15:00.1 0c00: 1180:0552 (rev 09) 15:00.2 0805: 1180:0822 (rev 18) [walkert@tthink ~]$ rpm -qa | grep iwl4965-firmware iwl4965-firmware-228.61.2.24-2.fc12.noarch [walkert@tthink ~]$ rpm -ql iwl4965-firmware /lib/firmware/iwlwifi-4965-1.ucode /lib/firmware/iwlwifi-4965-2.ucode /usr/share/doc/iwl4965-firmware-228.61.2.24 /usr/share/doc/iwl4965-firmware-228.61.2.24/LICENSE /usr/share/doc/iwl4965-firmware-228.61.2.24/README [walkert@tthink ~]$ rpm -qV iwl4965-firmware [walkert@tthink ~]$ At first the fc12 in the package version looks funny but I did go check the repo and that is indeed what it says.
(In reply to comment #0) > > (these are spread across 2.6.33.2-41.fc13.x86_64, 2.6.33.2-57.fc13.x86_64, > 2.6.33.3-72.fc13.x86_64, 2.6.33.3-79.fc13.x86_64, 2.6.33.3-85.fc13.x86_64) Could somebody please remind me where to get these sources? > Thinking something microcode related I turned on debugging and it helpfully > crashed immediately after associating (log attached). I cannot find the attachment. Could you please point me to it? > > May 2 18:43:53 tthink kernel: mac80211-phy0: failed to remove key (0, > 00:22:6b:4f:f4:0d) from hardware (-22) > May 3 23:12:21 tthink kernel: mac80211-phy0: failed to remove key (0, > 00:22:6b:4f:f4:0c) from hardware (-22) This is typically not a problem since mac80211 asks driver to remove key as it is going down and driver does not send commands to device while it is going down.
Sorry, thought I attached the log. Here you go.
Created attachment 413114 [details] syslog w/ debug turned on
For Fedora kernel sources, you can start here: http://koji.fedoraproject.org/koji/packageinfo?packageID=8 Pick the specific version of interest and navigate to find the src rpm. If you need guidance on dealing with those, I can provide that too.
Adding a couple more crashes in case it is helpful.
Created attachment 414281 [details] More syslog
Thomas, Are you perhaps using fragmentation? Does this happen when you run a particular application (which may be using fragmentation)?
nothing I can think of... some basic web browsing, ssh, nfs (v4 over tcp). I was expecting a counter or something in netstat -s but no luck. Any way to tell for sure without a packet capture when it happens?
It seems that you can reproduce this easily. Could you please add some flags to debugging that will capture more data? Please note that this will generate a lot of data so if you were using dmesg to capture the log, please set up things to use syslog instead (route all kernel messages, kern.*, to a file). Please reproduce with debug flag of 0x843fff In addition to this, please compile your mac80211 module with CONFIG_MAC80211_VERBOSE_DEBUG Thanks!
Created attachment 416370 [details] MAC80211 debugging info as well
Something about the additional delay added by all that debugging makes this happen with much reduced frequency. Got one last night while the machine was completely idle though (attached). The other odd thing I've noticed is that the card seems to hit these a lot immediately after associating (like within the same second).
This is very strange. If I read the log correctly the issue is in the scanning code. What seems to happen is a normal scan is requested and for some reason does not complete (notification containing results for last channel does not arrive). To deal with this the driver send an "abort scan" command to the device, at which point you get the firmware sysassert. The strange thing is that the sysassert shows no log of receipt of the scan abort command and actually shows the error to be somewhere in a TX command. This is hinting towards some memory corruption. Since the FC specific patches did modify the scanning code, could you please check if this issue is present in vanilla 2.6.33.3 also?
Thomas, I don't know if you are in a position to build/use kernels from upstream sources, so I put together a test kernel that removes all the iwlwifi patches from an otherwise stock F-13 kernel: http://koji.fedoraproject.org/koji/taskinfo?taskID=2210924 Testing that isn't really conclusive (i.e. some other Fedora patch could be at fault), but it eliminates the iwlwifi patches without destabilizing anything else your system might rely upon. And, it is probably a lot easier for you to install. :-)
Grrr...that build failed, for no particular reason -- sometimes I hate Koji... I think this one will make it... http://koji.fedoraproject.org/koji/taskinfo?taskID=2211020
No worries, I do this stuff for a living. I'll take it if it goes through or else just build my own later today.
With the koji kernel from #17: May 27 19:04:04 tthink kernel: WARNING: at drivers/net/wireless/iwlwifi/iwl-tx.c:1239 iwl_tx_cmd_complete+0x8d/0x30c [iwlcore]() May 27 19:04:04 tthink kernel: Hardware name: 1709CTO May 27 19:04:04 tthink kernel: wrong command queue 0, sequence 0x0 readp=6 writep=12 May 27 19:04:04 tthink kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss fuse ipt_MASQUERADE iptable_nat nf_nat rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel arc4 snd_hda_codec ecb snd_hwdep snd_seq iwlagn snd_seq_device iwlcore snd_pcm mac80211 snd_timer May 27 19:04:04 tthink kernel: iwlagn 0000:03:00.0: Error setting new RXON (-5) May 27 19:04:04 tthink kernel: thinkpad_acpi sdhci_pci sdhci btusb snd mmc_core cfg80211 irda soundcore snd_page_alloc iTCO_wdt bluetooth e1000e iTCO_vendor_support i2c_i801 crc_ccitt rfkill microcode sha256_generic aes_x86_64 aes_generic cbc dm_crypt firewire_ohci firewire_core crc_itu_t yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] May 27 19:04:04 tthink kernel: Pid: 1402, comm: NetworkManager Not tainted 2.6.33.4-107.bz590333.fc13.x86_64 #1 May 27 19:04:04 tthink kernel: Call Trace: May 27 19:04:04 tthink kernel: <IRQ> [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f May 27 19:04:04 tthink kernel: [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e May 27 19:04:04 tthink kernel: [<ffffffffa02a7b21>] ? iwl_add_sta_callback+0x67/0xe7 [iwlcore] May 27 19:04:04 tthink kernel: [<ffffffffa02a4712>] iwl_tx_cmd_complete+0x8d/0x30c [iwlcore] May 27 19:04:04 tthink kernel: [<ffffffffa02e5aac>] iwl_rx_handle+0x36e/0x4a6 [iwlagn] May 27 19:04:04 tthink kernel: [<ffffffffa02e77ee>] iwl_irq_tasklet_legacy+0x4fc/0x75d [iwlagn] May 27 19:04:04 tthink kernel: [<ffffffffa029b8ee>] ? __iwl_read32.clone.5+0xab/0xba [iwlcore] May 27 19:04:04 tthink kernel: [<ffffffff810502ad>] tasklet_action+0x80/0xdf May 27 19:04:04 tthink kernel: [<ffffffff810516c5>] __do_softirq+0xe0/0x1a1 May 27 19:04:04 tthink kernel: [<ffffffff8109a0de>] ? handle_IRQ_event+0x5b/0x11c May 27 19:04:04 tthink kernel: [<ffffffff8100aa1c>] call_softirq+0x1c/0x30 May 27 19:04:04 tthink kernel: [<ffffffff8100c21d>] do_softirq+0x41/0x7e May 27 19:04:04 tthink kernel: [<ffffffff81051518>] irq_exit+0x36/0x78 May 27 19:04:04 tthink kernel: [<ffffffff8100b957>] do_IRQ+0xa7/0xbe May 27 19:04:04 tthink kernel: [<ffffffff8142b2d3>] ret_from_intr+0x0/0x11 May 27 19:04:04 tthink kernel: <EOI> [<ffffffff811ceb3a>] ? avtab_search_node+0x47/0x7b May 27 19:04:04 tthink kernel: [<ffffffff811d5c77>] cond_compute_av+0x21/0x86 May 27 19:04:04 tthink kernel: [<ffffffff811d5131>] context_struct_compute_av+0x18c/0x27e May 27 19:04:04 tthink kernel: [<ffffffff8104658f>] ? try_to_wake_up+0x2c4/0x2d6 May 27 19:04:04 tthink kernel: [<ffffffff811d52b6>] security_compute_av_core+0x93/0xc1 May 27 19:04:04 tthink kernel: [<ffffffff811d5424>] security_compute_av+0xc7/0x20e May 27 19:04:04 tthink kernel: [<ffffffff811c32ab>] avc_has_perm_noaudit+0x10b/0x39c May 27 19:04:04 tthink kernel: [<ffffffff8142a1a8>] ? mutex_lock+0x24/0x4b May 27 19:04:04 tthink kernel: [<ffffffff8110ed53>] ? might_fault+0x1c/0x1e May 27 19:04:04 tthink kernel: [<ffffffff81400de9>] ? unix_poll+0x19/0x95 May 27 19:04:04 tthink kernel: [<ffffffff811c3572>] avc_has_perm+0x36/0x69 May 27 19:04:04 tthink kernel: [<ffffffff811c461a>] socket_has_perm+0x6e/0x84 May 27 19:04:04 tthink kernel: [<ffffffff811c46d6>] selinux_socket_sendmsg+0x1a/0x1c May 27 19:04:04 tthink kernel: [<ffffffff811be6a9>] security_socket_sendmsg+0x11/0x13 May 27 19:04:04 tthink kernel: [<ffffffff8137a126>] __sock_sendmsg+0x3e/0x64 May 27 19:04:04 tthink kernel: [<ffffffff8137a43e>] sock_sendmsg+0xa3/0xbc May 27 19:04:04 tthink kernel: [<ffffffff810fbab6>] ? try_get_mem_cgroup_from_mm+0x34/0x44 May 27 19:04:04 tthink kernel: [<ffffffff810fcaaa>] ? __mem_cgroup_try_charge+0x49/0x22b May 27 19:04:04 tthink kernel: [<ffffffff8137b653>] ? __sys_recvmsg+0x18e/0x241 May 27 19:04:04 tthink kernel: [<ffffffff8137a4bf>] ? sockfd_lookup_light+0x1b/0x53 May 27 19:04:04 tthink kernel: [<ffffffff8137bfef>] sys_sendto+0x10b/0x14d May 27 19:04:04 tthink kernel: [<ffffffff811c750e>] ? selinux_file_permission+0xa2/0xae May 27 19:04:04 tthink kernel: [<ffffffff811be0ef>] ? security_file_permission+0x11/0x13 May 27 19:04:04 tthink kernel: [<ffffffff8142dc05>] ? do_page_fault+0x2bd/0x2ed May 27 19:04:04 tthink kernel: [<ffffffff81096337>] ? audit_syscall_entry+0x119/0x145 May 27 19:04:04 tthink kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b May 27 19:04:04 tthink kernel: ---[ end trace 60828a819afe8124 ]--- May 27 19:04:04 tthink kernel: iwl data: 00000000: 20 00 00 00 00 00 00 00 01 00 00 00 1f 80 00 00 ............... May 27 19:04:04 tthink kernel: iwl data: 00000010: 00 00 00 00 01 00 00 00 ea 9b 00 00 87 01 00 00 ................ Do you want me to turn back on the verbose debug (seems to make it take longer to happen but obviously more useful output)? MAC80211 as well?
Don't know if this is the same issue but iwlagn is also unstable for me. lcspi: lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.3 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 03) 01:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9600M GT] (rev a1) 02:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) 06:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host Controller 06:00.1 System peripheral: JMicron Technology Corp. SD/MMC Host Controller 06:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller 06:00.3 System peripheral: JMicron Technology Corp. MS Host Controller 06:00.4 System peripheral: JMicron Technology Corp. xD Host Controller from dmesg: iwlagn 0000:02:00.0: Microcode SW error detected. Restarting 0x2000000. iwlagn 0000:02:00.0: Start IWL Error Log Dump: iwlagn 0000:02:00.0: Status: 0x000212E4, count: 5 iwlagn 0000:02:00.0: Desc Time data1 data2 line iwlagn 0000:02:00.0: SYSASSERT (#05) 2079665153 0x4180803D 0x00008907 1364 iwlagn 0000:02:00.0: blink1 blink2 ilink1 ilink2 iwlagn 0000:02:00.0: 0x039B8 0x039B8 0x008B2 0x00000 iwlagn 0000:02:00.0: Start IWL Event Log Dump: display last 20 entries iwlagn 0000:02:00.0: EVT_LOGT:0000546992:0x01002118:0211 iwlagn 0000:02:00.0: EVT_LOGT:0000546998:0x00000000:0212 iwlagn 0000:02:00.0: EVT_LOGT:0000547054:0x00000000:0215 iwlagn 0000:02:00.0: EVT_LOGT:0000547058:0x00000008:0220 iwlagn 0000:02:00.0: EVT_LOGT:0000547416:0x020f001c:0206 iwlagn 0000:02:00.0: EVT_LOGT:0000547418:0x00000001:0204 iwlagn 0000:02:00.0: EVT_LOGT:0000547421:0x00000001:0214 iwlagn 0000:02:00.0: EVT_LOGT:0000547422:0x01002118:0209 iwlagn 0000:02:00.0: EVT_LOGT:0000547567:0x00000000:0210 iwlagn 0000:02:00.0: EVT_LOGT:0000547578:0x00000000:0207 iwlagn 0000:02:00.0: EVT_LOGT:0000547580:0x01002118:0211 iwlagn 0000:02:00.0: EVT_LOGT:0000547586:0x00000000:0212 iwlagn 0000:02:00.0: EVT_LOGT:0000547654:0x00000000:0215 iwlagn 0000:02:00.0: EVT_LOGT:0000547658:0x00000008:0220 iwlagn 0000:02:00.0: EVT_LOGT:0000547675:0x00000000:0302 iwlagn 0000:02:00.0: EVT_LOGT:0000547704:0x000000d4:0303 iwlagn 0000:02:00.0: EVT_LOGT:0000547709:0x00001010:0217 iwlagn 0000:02:00.0: EVT_LOGT:0000547709:0x020f001c:0217 iwlagn 0000:02:00.0: EVT_LOGT:0000547752:0x0431004e:0401 iwlagn 0000:02:00.0: EVT_LOGT:0000547764:0x00000000:0125 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:fe:45:a4:07 tid = 0
Any more ideas here? Removing the Fedora patches definitely made the crashes a bit less frequent (as does turning on successive layers of debugging). Not exactly a solution though. I have noticed that a number of the errors seem to occur either when the interface is coming up or going down (one happened during a suspend). Still do have some that appear out of the blue (machine doing nothing or very light work) though... Anyway, let me know if there is anything else I can provide.
Thomas, are you saying that after moving to the Fedora kernel without the iwlwifi patches the firmware sysassert is not showing up anymore? Is the warning ("May 27 19:04:04 tthink kernel: WARNING: at drivers/net/wireless/iwlwifi/iwl-tx.c:1239 iwl_tx_cmd_complete+0x8d/0x30c [iwlcore]()") the only issue you are seeing right now? Do you see the warning only on 5GHz?
It still shows up but less frequently... I can't tell whether it is exactly the same error every time but it certainty looks similar. The timing of adding the debug output makes it even less likely. Not a real solution by any means. Any other ideas of anything to try?
Could you please send a log of the error you are seeing now? Debug should not be needed at this point to confirm that it is the same issue as before.
[walkert@tthink ~]$ sudo zgrep -A800 'Microcode SW error' messages* > /tmp/crashes-0622 All of the crashes in recent history that I have (the last of which was today, with debugging on and the fedora kernel minus iwl patches). FWIW- SSID "XXXXXXXXXX" is 5ghz and "XXXXXXXXXX-alt" is 2.4ghz.
Created attachment 426116 [details] Crashes as of June 22nd
Excuse me, last was the 17th because I've had the very verbose debugging turned on since then (today's crash was actually rhythymbox, for once not the kernel).
Created attachment 427520 [details] Get more info Since debugging makes this harder to reproduce, could you please try with this patch to help get some more data? Ideally ... having this patch _and_ debugging will help a lot, but if you cannot reproduce with debugging, please try this patch on its own. Thank you
I applied your extra patch to the current stock FC13 kernel (reverting back to including the FC iwl patches now). Got a crash with debugging enabled last night... Wasn't sure how much context to give you around it in syslog so let me know if you need anything more.
Created attachment 429268 [details] Using "Get more info" and stock kernel
Unfortunately debug was not enabled during this encounter ... and to add to this this encounter was for a different issue that the patch I provided is not relevant for. In this run you encountered an NMI error where previously it was a SYSASSERT.
My mistake, I thought I had re-enabled it. Even the stack traces you added seemed to make it much less common. Debugging (0x43fff) re-enabled.
Any more information available here?
For whatever reason, these problems have tailed off to the point where I've only had about 1 a week since early July (this is without debug even enabled so I should have been seeing them with some frequency). Been running 2.6.33.6-147 or 2.6.33.6-147.2.4 during that time (unpatched).
OK, please reopen if/when this again becomes enough of a problem to collect the requested information...thanks!
Created attachment 440775 [details] More crashes with increased debugging MAC80211 debug, getmoreinfo patch, and iwlagn debug=0x43fff.
I'm experiencing the same problem with the latest updates on Fedora 13 x86_64 03:00.0 Network controller: Intel Corporation WiFi Link 5100 Subsystem: Intel Corporation WiFi Link 5100 AGN Flags: bus master, fast devsel, latency 0, IRQ 35 Memory at fcffe000 (64-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: iwlagn Kernel modules: iwlagn Linux jupiter64 2.6.34.7-56.fc13.x86_64 #1 SMP Wed Sep 15 03:36:55 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Before the latest two kernel updates it worked much better.
The latest kernel updates made it much worse for me too. If you need any logs please tell me how I can get them. So far I can only report the crashs via abrt.
I, on the other hand, have virtually seen this problem go away with more recent kernels (FC13 and then FC14 alpha/beta/GA). About a month ago I got lazy and stopped rebuilding mac80211 with the debug patches and noticed that I didn't see the number of crashes increase (as had often happened when I decreased debug output). About 3 weeks ago, I completely removed the iwlagn modprobe debug options which should have returned me to multiple crashes a day. Not had anything in that period... Not sure what changed but 2.6.33+ has been pretty rock solid for me.
I've also noticed improvements in stability. But is not rock solid here, I had two crashes. Once I was able to make it work without restarting the laptop, by switching off/on the wifi switch. The other time it didn't work, I had to restart my laptop.
I'm seeing exactly the same "Microcode SW Error Detected" problem messages from iwlagn in dmesg as described in #20 under the newest Fedora 14 x86_64 kernel. Machine is a Lenovo W500 Thinkpad. [christopher@aries ~]$ uname -a Linux aries 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux [christopher@aries ~]$ lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07) 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07) 00:03.3 Serial controller: Intel Corporation Mobile 4 Series Chipset AMT SOL Redirection (rev 07) 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network Connection (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M-E LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3650 03:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection 15:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba) 15:00.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 04) 15:00.2 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 21) 15:00.4 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 11) 15:00.5 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 11)
After the Microcode SW Error, does the device recover? Or does it require a manual modprobe or reboot?
Closing based on comment 39 and comment 40 and lack of response to comment 42...
Bug shouldn't be closed, as it isn't even close to being resolved. Current kernel of my machine (#41) has been updated to: Linux aries 2.6.35.12-88.fc14.x86_64 #1 SMP Thu Mar 31 21:21:57 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux My dmesg log continues to fill up with these error events, with a new entry set every couple of minutes whenever I'm connected to any wireless AP. The device recovers on its own...however, it makes sustained network communication impossible. I first discovered the problem because the machine could not maintain a usable Skype connection, exhibiting bad quality and frequent momentary call "freezes" and outright disconnects.