My wireless died mid-use. I unloaded the module and then reloaded it to try to get it working ago. Wireless card is an Intel Corporation Wireless WiFi Link 5300. Here's the output from dmesg: iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms. wlan0: disassociating by local choice (reason=3) iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms. iwlagn 0000:03:00.0: Aborted scan still in progress after 100ms iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms. iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms. iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms. iwlagn 0000:03:00.0: Error setting new RXON (-110) iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms. iwlagn 0000:03:00.0: PCI INT A disabled ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:687 dma_debug_device_change+0x14b/0x192() (Not tainted) Hardware name: 2777CTO pci 0000:03:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=42] Modules linked in: tun fuse rfcomm sco bridge stp llc bnep l2cap sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb iwlagn(-) snd_hda_codec_conexant iwlcore uvcvideo snd_hda_intel videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec mac80211 snd_hwdep joydev snd_pcm i2c_i801 cfg80211 snd_timer thinkpad_acpi hwmon iTCO_wdt iTCO_vendor_support btusb bluetooth e1000e snd rfkill soundcore snd_page_alloc wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode] Pid: 13472, comm: rmmod Not tainted 2.6.31.1-56.fc12.x86_64 #1 Call Trace: [<ffffffff8106422c>] warn_slowpath_common+0x95/0xc3 [<ffffffff810642e7>] warn_slowpath_fmt+0x50/0x66 [<ffffffff8128e771>] ? dma_debug_device_change+0xd6/0x192 [<ffffffff8128e7e6>] dma_debug_device_change+0x14b/0x192 [<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e [<ffffffff81509a35>] notifier_call_chain+0x72/0xba [<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e [<ffffffff81086dc8>] __blocking_notifier_call_chain+0x63/0x8e [<ffffffff81086e1a>] blocking_notifier_call_chain+0x27/0x3d [<ffffffff81351bd7>] __device_release_driver+0xc3/0xde [<ffffffff81351ca2>] driver_detach+0xb0/0xe6 [<ffffffff813509d6>] bus_remove_driver+0xb8/0x10d [<ffffffff81352524>] driver_unregister+0x7b/0x9a [<ffffffff81298032>] pci_unregister_driver+0x57/0xb7 [<ffffffffa025b1a8>] iwl_exit+0x28/0x43 [iwlagn] [<ffffffff810a2667>] sys_delete_module+0x1e3/0x279 [<ffffffff81505e96>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b ---[ end trace 0e80a0cf85f72dbf ]---
Reinette thinks this is a dupe of: http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
There appear to be a bunch of issues with the Intel 5300 on F12. I'm having tons of problems staying online. I can unload and reload the module and sometimes it'll stay up for a little while and other times it'll die almost immediately. I get a bunch of different errors along the way, such as the one listed above, and sometimes: iwlagn 0000:03:00.0: No space for Tx iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28 iwlagn 0000:03:00.0: No space for Tx iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 iwlagn 0000:03:00.0: Error setting new RXON (-28) iwlagn 0000:03:00.0: No space for Tx iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 iwlagn 0000:03:00.0: Error setting new RXON (-28) iwlagn 0000:03:00.0: No space for Tx iwlagn 0000:03:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28 I tried building the compat-wireless drivers (2010-02-23) but then I lost access to my a/n access point so I backed down to the drivers in kernel 2.6.31.12-174.2.22.fc12.x86_64
Also seeing: iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms. iwlagn 0000:03:00.0: Error setting new RXON (-110) iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms. iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms. iwlagn 0000:03:00.0: Error setting new RXON (-110) ... Which error I see seems to vary from run to run. But it all winds up in the same condition -- I have to unload/reload the driver and then hope it will stay up long enough to do something before it dies again.
And then like that error I then got: iwlagn 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX and the network is unresponsive. I'm reloading with modprobe iwlagn debug50=0x40000 to see if I can get more data.
... but of course this hasn't given me any data. :-( It's crashed twice so far and I've seen no firmware error messages.
Created attachment 395852 [details] microcode debug output Aha, I finally got a firmware crash. Here's the dump I got into my dmesg log. Hope this helps debug this issue; I'm tired to restarting my network ever 5-10 minutes!
This has been plaguing us to varying degrees for several kernel releases (both in Fedora and upstream)... :-(
Well, I've certainly got a good test-case scenario at home... It's kinda keeping me from getting work done.. To the point where I'm considering buying a USB 802.11 token or running a very long ethernet cable. My home AP is a WRT610N running dd-wrt and I'm connecting via 802.11(a/n) using WPA2/TKIP. I'm certainly willing to help debug this anyway I can, including running test kernels or running in debug mode. I seem to have found ways to tickle the bug pretty consistently, such that I can cause it to happen within about 15-30 minutes (at most -- sometimes even as short as 5min!) So, John, is there anything I can do to help?
I hope so, but I'll rely on Reinette to advise. The logs in comment 7 seem like they might be helpful. Any reason you are using TKIP? CCMP is generally better (i.e. more secure). I wonder if using CCMP has any effect on reproduceability? FWIW I see this irregularly on a WEP network here.
(In reply to comment #10) > I hope so, but I'll rely on Reinette to advise. The logs in comment 7 seem > like they might be helpful. The logs still point to a bug that is haunting us also, http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037. The logs in comment 7 mention that the ucode error follows some other errors, but that log does not contain those other errors. Once there is a problem it does not really help much to trace ucode errors that occur when we are already in problem state.
(In reply to comment #11) > The logs still point to a bug that is haunting us also, > http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037. This bug report has been updated with some patches that address this issue. See http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037#c113 Also see your own bug report https://bugzilla.redhat.com/show_bug.cgi?id=573029 which may be a duplicate of this one. I did update that bug report with some details similar to the update to the intellinuxwireless.org bug. See https://bugzilla.redhat.com/show_bug.cgi?id=573029#c16
Please try the test kernels here (when the build completes): http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739 These contain backports of the patches Reinette identified. Do these improve the situation?
(In reply to comment #13) > Please try the test kernels here (when the build completes): > > http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739 > > These contain backports of the patches Reinette identified. Do these improve > the situation? In addition to this Zhu Yi just created a patch to address the DMA warnings. This has not been pushed upstream yet, but if you are interested you can try out http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git;a=commit;h=7c9e64c19c02ab9f9450cceb2c2372143d3fa38e
Any word on the kernels from comment 13? I don't know how much longer Koji will keep them available...
These messages started to appear in messages when laptop lost wireless: Apr 9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: time out after 500ms. Apr 9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-110) Then these: Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28) Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28) Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28 Apr 9 21:22:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx And after that: Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28) Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Do we still have a test kernel. Please let me know if full logs and hardware details are needed.
Nitin, what kernel are you using? Did you try the ones from comment 13?
John, I am using: kernel-firmware-2.6.32.10-90.fc12.noarch kernel-2.6.32.10-90.fc12.x86_64 Looks like koji does not have test kernels any more.
Updated test kernels building now: http://koji.fedoraproject.org/koji/taskinfo?taskID=2110994
John, koji is displaying an error "BuildError: error building package (arch noarch), mock exited with status 1; see build.log for more information"
Ugh -- Koji hiccup... I think this one will make it: http://koji.fedoraproject.org/koji/taskinfo?taskID=2111335
Build completed -- please test! :-)
John, so far its good, was running on wireless whole day and it did not disconnect ( though after locking (ctrl+alt+L) when i tried to resume system response was very sluggish, I think it was due to compiz, after disabling compiz I cant reproduce that behavior ) .. let me know if you need some information from this system
kernel-2.6.32.11-102.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.32.11-102.fc12
kernel-2.6.32.11-102.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.32.11-102.fc12
Adel, can you detail the problems you saw w/ kernel-2.6.32.11-104.fc12 (almost certainly would be the same w/kernel-2.6.32.11-102.fc12) ? FWIW, I've been using -102.fc12 on several boxes w/ no apparent problems over the past few days.
(In reply to comment #26) > Adel, can you detail the problems you saw w/ kernel-2.6.32.11-104.fc12 (almost > certainly would be the same w/kernel-2.6.32.11-102.fc12) ? > > FWIW, I've been using -102.fc12 on several boxes w/ no apparent problems over > the past few days. OK; my setup is: HP 6930p; iwlagn 5300; F-12 x86_64 When using -99 there are no issues at all (i.e everything is fine). With -104 I get two problems: 1) After ~1-2 hours of 80211n usage the driver restarts the firmware 3-4 times; after that the connection is either _very_ slow or comes to a complete halt (no data being sent). Disconnecting means I can no longer connect to the AP, the only way to get back a working connection is to reload the module (or reboot). 2) When resuming from suspend the card simply can't scan (no errors in dmesg though); scanning just returns that the device is busy. I tried disable_hw_scan=1 which seemed to fix it at first but after a second suspend / resume cycle it happened again. After that I played with it for a while and it seems that it is not 100% but something like 95% reproduce able. Again neither 1) nor 2) happens with -99 (the former seems to be caused by the firmware restart patches; while I have no idea what could have caused the later). If you need any more information feel free to ask. (I don't have the exact dmesg output at hand right now).
http://koji.fedoraproject.org/koji/taskinfo?taskID=2126607 This kernel contains "mac80211: fix deferred hardware scan requests", which I suspect might address some part of the scan-related issues you are experiencing. Could you give those a try (once the build completes)?
(In reply to comment #28) > http://koji.fedoraproject.org/koji/taskinfo?taskID=2126607 > > This kernel contains "mac80211: fix deferred hardware scan requests", which I > suspect might address some part of the scan-related issues you are > experiencing. Could you give those a try (once the build completes)? I just downloaded the x86_64 build and tested with it. I could not reproduce the scan issue even after 6 suspend / resume cycles. (wpa_supplicant segfaulted once but it seems unrelated).
I tried kernel-2.6.32.11-102.fc12, and though less frequent I am still seeing: Apr 19 23:03:40 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Apr 19 23:03:40 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Apr 19 23:03:40 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF Apr 19 23:03:45 localhost kernel: iwlagn 0000:0c:00.0: Could not load the INST uCode section Apr 19 23:03:45 localhost kernel: iwlagn 0000:0c:00.0: Unable to set up bootstrap uCode: -110 Apr 19 23:03:45 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF and wireless disconnects afterwards, only solution is to reboot the machine.
kernel-2.6.32.11-105.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.32.11-105.fc12
Nitin, I'm not sure that the "MAC is in deep sleep!" issue is the same as the "Error sending REPLY_RXON" issue. You may want to submit a separate bug for that one.
(In reply to comment #32) > Nitin, I'm not sure that the "MAC is in deep sleep!" issue is the same as the > "Error sending REPLY_RXON" issue. You may want to submit a separate bug for > that one. Yes ... different problem ... when driver starts getting all 1s when reading from device it means that the device disconnected itself from the PCI bus. We have one patch floating around addressing the issue, please see http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037#c112 ... you could maybe respond with your test results of that patch when you submit a new bug report.
(In reply to comment #27) > (In reply to comment #26) > > Adel, can you detail the problems you saw w/ kernel-2.6.32.11-104.fc12 (almost > > certainly would be the same w/kernel-2.6.32.11-102.fc12) ? > > > > FWIW, I've been using -102.fc12 on several boxes w/ no apparent problems over > > the past few days. > > OK; my setup is: > > HP 6930p; iwlagn 5300; F-12 x86_64 > > When using -99 there are no issues at all (i.e everything is fine). > > With -104 I get two problems: > > 1) After ~1-2 hours of 80211n usage the driver restarts the firmware 3-4 times; > after that the connection is either _very_ slow or comes to a complete halt (no > data being sent). Disconnecting means I can no longer connect to the AP, the > only way to get back a working connection is to reload the module (or reboot). Here is the log output when this happens: ------------ iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 --------------
Not timestamps...how often are those firmware restarts? After the restarts, do you recover connectivity? Or is it lost forever? The whole point of the patchset introduced here is to allow for restarting the firmware rather than just dying on the "Error sending REPLY_RXON" or similar errors.
(In reply to comment #35) > Not timestamps...how often are those firmware restarts? Here some timestamps: ------------- Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: low ack count detected, restart firmware Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:18:38 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:18:38 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:18:38 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:18:38 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:18:38 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:19:14 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:19:22 localhost kernel: iwlagn 0000:02:00.0: low ack count detected, restart firmware Apr 20 21:19:22 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:19:22 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:19:22 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:19:22 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:19:22 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:19:22 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:19:22 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:19:35 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:19:46 localhost kernel: iwlagn 0000:02:00.0: low ack count detected, restart firmware Apr 20 21:19:46 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:19:46 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:19:46 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:19:46 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:19:46 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:19:46 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:19:46 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:20:43 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:20:55 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:21:03 localhost kernel: iwlagn 0000:02:00.0: low ack count detected, restart firmware Apr 20 21:21:03 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:21:03 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:21:03 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:21:03 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:21:03 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:21:03 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:21:03 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:21:20 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:21:42 localhost kernel: iwlagn 0000:02:00.0: queue 10 stuck 3 time. Fw reload. Apr 20 21:21:42 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:21:43 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:21:43 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:21:43 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:21:43 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:21:43 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:21:43 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:21:50 localhost kernel: iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 Apr 20 21:22:15 localhost kernel: iwlagn 0000:02:00.0: queue 10 stuck 3 time. Fw reload. Apr 20 21:22:15 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload Apr 20 21:22:15 localhost kernel: Registered led device: iwl-phy0::radio Apr 20 21:22:15 localhost kernel: Registered led device: iwl-phy0::assoc Apr 20 21:22:15 localhost kernel: Registered led device: iwl-phy0::RX Apr 20 21:22:15 localhost kernel: Registered led device: iwl-phy0::TX Apr 20 21:22:15 localhost kernel: iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting Apr 20 21:22:15 localhost kernel: iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 Apr 20 21:22:21 localhost kernel: iwlagn 0000:02:00.0: queue 2 stuck 3 time. Fw reload. Apr 20 21:22:21 localhost kernel: iwlagn 0000:02:00.0: On demand firmware reload ---------------- > After the restarts, do you recover connectivity? Or is it lost forever? It becomes very slow and comes to a complete halt (transfers no data), when I try to reconnect to the AP the driver is pretty much in a dead state (have to reload the module to get it back working). > The whole point of the patchset introduced here is to allow for restarting the > firmware rather than just dying on the "Error sending REPLY_RXON" or similar > errors. It seems to be a bit too aggressive when deciding whether a restart is needed or not (and it seems restart aren't "free" so they should only be done when really needed.)
Ugh... :-( Anyone want to review my backports of those patches to see if I screwed-up something?
kernel-2.6.32.11-105.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.32.11-105.fc12
(In reply to comment #36) > It seems to be a bit too aggressive when deciding whether a restart is needed > or not (and it seems restart aren't "free" so they should only be done when > really needed.) If you load your module with "debug=0x80" it will print the details of what is actually used to decide the low ack count and give an idea of what problem in environment is causing the system to reset itself.
(In reply to comment #39) > (In reply to comment #36) > > > It seems to be a bit too aggressive when deciding whether a restart is needed > > or not (and it seems restart aren't "free" so they should only be done when > > really needed.) > > If you load your module with "debug=0x80" it will print the details of what is > actually used to decide the low ack count and give an idea of what problem in > environment is causing the system to reset itself. --------------- ieee80211 phy0: I iwl_good_ack_health agg ba_timeout delta = 6 ieee80211 phy0: I iwl_good_ack_health actual_ack_cnt delta = 7, expected_ack_cnt = 29 ieee80211 phy0: I iwl_good_ack_health agg ba_timeout delta = 6 ieee80211 phy0: I iwl_good_ack_health actual_ack_cnt delta = 0, expected_ack_cnt = 96 ieee80211 phy0: I iwl_good_ack_health agg ba_timeout delta = 51 iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 ieee80211 phy0: I iwl_good_ack_health actual_ack_cnt delta = 0, expected_ack_cnt = 32 ieee80211 phy0: I iwl_good_ack_health agg ba_timeout delta = 16 iwlagn 0000:02:00.0: low ack count detected, restart firmware iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 1 time ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 2 time ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 3 time ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 1 time ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 2 time ieee80211 phy0: I iwl_check_stuck_queue queue 10, not read 3 time iwlagn 0000:02:00.0: queue 10 stuck 3 time. Fw reload. iwlagn 0000:02:00.0: On demand firmware reload Registered led device: iwl-phy0::radio Registered led device: iwl-phy0::assoc Registered led device: iwl-phy0::RX Registered led device: iwl-phy0::TX iwlagn 0000:02:00.0: Stopping AGG while state not ON or starting iwlagn 0000:02:00.0: queue number out of range: 0, must be 10 to 19 ieee80211 phy0: I iwl_check_stuck_queue queue 2, not read 1 time ieee80211 phy0: I iwl_check_stuck_queue queue 2, not read 2 time ieee80211 phy0: I iwl_check_stuck_queue queue 2, not read 3 time iwlagn 0000:02:00.0: queue 2 stuck 3 time. Fw reload. iwlagn 0000:02:00.0: iwl_tx_agg_start on ra = 00:24:b2:d8:20:82 tid = 0 ------------- Also it seems it is not triggered by time but by opening and closing the lid (but not 100% reproduce able and I have it configured to only blank the screen on lid close so it does not suspend the system).
I have been using 2.6.32.12-110.rc2.fc12.x86_64 (checked out from cvs and built locally) for a day now and the problem has not show up until now. I don't know why though (the patches / changes seem unrelated) it might be a coincidence or an unrelated patch fixed it. I will keep using this kernel and see report back if whether it happens again or not.
(In reply to comment #41) > I have been using 2.6.32.12-110.rc2.fc12.x86_64 (checked out from cvs and built > locally) for a day now and the problem has not show up until now. Err.. badly worded it has not shown up at all (yet?).
(In reply to comment #42) > (In reply to comment #41) > > I have been using 2.6.32.12-110.rc2.fc12.x86_64 (checked out from cvs and built > > locally) for a day now and the problem has not show up until now. > > Err.. badly worded it has not shown up at all (yet?). No problems today either so I'd say 2.6.32.12-110.rc2.fc12.x86_64 is fine.
kernel-2.6.32.12-114.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.32.12-114.fc12