DescriptionJohn W. Linville
2010-04-13 19:33:47 UTC
I'm not entirely sure if this is a problem w/ F-13, but the patches that fixed it in F-12 seem to more-or-less apply to F-13...
+++ This bug was initially created as a clone of Bug #527824 +++
My wireless died mid-use. I unloaded the module and then reloaded it to try to get it working ago. Wireless card is an Intel Corporation Wireless WiFi Link 5300. Here's the output from dmesg:
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
wlan0: disassociating by local choice (reason=3)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Aborted scan still in progress after 100ms
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: PCI INT A disabled
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:687 dma_debug_device_change+0x14b/0x192() (Not tainted)
Hardware name: 2777CTO
pci 0000:03:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=42]
Modules linked in: tun fuse rfcomm sco bridge stp llc bnep l2cap sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb iwlagn(-) snd_hda_codec_conexant iwlcore uvcvideo snd_hda_intel videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec mac80211 snd_hwdep joydev snd_pcm i2c_i801 cfg80211 snd_timer thinkpad_acpi hwmon iTCO_wdt iTCO_vendor_support btusb bluetooth e1000e snd rfkill soundcore snd_page_alloc wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode]
Pid: 13472, comm: rmmod Not tainted 2.6.31.1-56.fc12.x86_64 #1
Call Trace:
[<ffffffff8106422c>] warn_slowpath_common+0x95/0xc3
[<ffffffff810642e7>] warn_slowpath_fmt+0x50/0x66
[<ffffffff8128e771>] ? dma_debug_device_change+0xd6/0x192
[<ffffffff8128e7e6>] dma_debug_device_change+0x14b/0x192
[<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e
[<ffffffff81509a35>] notifier_call_chain+0x72/0xba
[<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e
[<ffffffff81086dc8>] __blocking_notifier_call_chain+0x63/0x8e
[<ffffffff81086e1a>] blocking_notifier_call_chain+0x27/0x3d
[<ffffffff81351bd7>] __device_release_driver+0xc3/0xde
[<ffffffff81351ca2>] driver_detach+0xb0/0xe6
[<ffffffff813509d6>] bus_remove_driver+0xb8/0x10d
[<ffffffff81352524>] driver_unregister+0x7b/0x9a
[<ffffffff81298032>] pci_unregister_driver+0x57/0xb7
[<ffffffffa025b1a8>] iwl_exit+0x28/0x43 [iwlagn]
[<ffffffff810a2667>] sys_delete_module+0x1e3/0x279
[<ffffffff81505e96>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81011f42>] system_call_fastpath+0x16/0x1b
---[ end trace 0e80a0cf85f72dbf ]---
--- Additional comment from dcbw on 2009-10-07 16:03:02 EDT ---
Reinette thinks this is a dupe of:
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037
--- Additional comment from fedora-triage-list on 2009-11-16 08:23:18 EST ---
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
--- Additional comment from warlord on 2010-02-23 17:35:53 EST ---
There appear to be a bunch of issues with the Intel 5300 on F12. I'm having tons of problems staying online. I can unload and reload the module and sometimes it'll stay up for a little while and other times it'll die almost immediately. I get a bunch of different errors along the way, such as the one listed above, and sometimes:
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: Error setting new RXON (-28)
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: Error setting new RXON (-28)
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
I tried building the compat-wireless drivers (2010-02-23) but then I lost access to my a/n access point so I backed down to the drivers in kernel 2.6.31.12-174.2.22.fc12.x86_64
--- Additional comment from warlord on 2010-02-23 17:43:12 EST ---
Also seeing:
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
...
Which error I see seems to vary from run to run. But it all winds up in the same condition -- I have to unload/reload the driver and then hope it will stay up long enough to do something before it dies again.
--- Additional comment from warlord on 2010-02-23 17:47:09 EST ---
And then like that error I then got:
iwlagn 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
Registered led device: iwl-phy0::radio
Registered led device: iwl-phy0::assoc
Registered led device: iwl-phy0::RX
Registered led device: iwl-phy0::TX
and the network is unresponsive. I'm reloading with modprobe iwlagn debug50=0x40000 to see if I can get more data.
--- Additional comment from warlord on 2010-02-23 17:58:48 EST ---
... but of course this hasn't given me any data. :-( It's crashed twice so far and I've seen no firmware error messages.
--- Additional comment from warlord on 2010-02-23 18:46:20 EST ---
Created an attachment (id=395852)
microcode debug output
Aha, I finally got a firmware crash. Here's the dump I got into my dmesg log. Hope this helps debug this issue; I'm tired to restarting my network ever 5-10 minutes!
--- Additional comment from linville on 2010-02-24 09:58:43 EST ---
This has been plaguing us to varying degrees for several kernel releases (both in Fedora and upstream)... :-(
--- Additional comment from warlord on 2010-02-24 10:09:01 EST ---
Well, I've certainly got a good test-case scenario at home... It's kinda keeping me from getting work done.. To the point where I'm considering buying a USB 802.11 token or running a very long ethernet cable. My home AP is a WRT610N running dd-wrt and I'm connecting via 802.11(a/n) using WPA2/TKIP. I'm certainly willing to help debug this anyway I can, including running test kernels or running in debug mode. I seem to have found ways to tickle the bug pretty consistently, such that I can cause it to happen within about 15-30 minutes (at most -- sometimes even as short as 5min!)
So, John, is there anything I can do to help?
--- Additional comment from linville on 2010-02-24 10:31:40 EST ---
I hope so, but I'll rely on Reinette to advise. The logs in comment 7 seem like they might be helpful.
Any reason you are using TKIP? CCMP is generally better (i.e. more secure). I wonder if using CCMP has any effect on reproduceability? FWIW I see this irregularly on a WEP network here.
--- Additional comment from reinette.chatre on 2010-02-24 12:13:11 EST ---
(In reply to comment #10)
> I hope so, but I'll rely on Reinette to advise. The logs in comment 7 seem
> like they might be helpful.
The logs still point to a bug that is haunting us also, http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037. The logs in comment 7 mention that the ucode error follows some other errors, but that log does not contain those other errors. Once there is a problem it does not really help much to trace ucode errors that occur when we are already in problem state.
--- Additional comment from reinette.chatre on 2010-03-19 18:33:13 EDT ---
(In reply to comment #11)
> The logs still point to a bug that is haunting us also,
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037.
This bug report has been updated with some patches that address this issue. See http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037#c113
Also see your own bug report https://bugzilla.redhat.com/show_bug.cgi?id=573029 which may be a duplicate of this one. I did update that bug report with some details similar to the update to the intellinuxwireless.org bug. See https://bugzilla.redhat.com/show_bug.cgi?id=573029#c16
--- Additional comment from linville on 2010-03-22 17:11:50 EDT ---
Please try the test kernels here (when the build completes):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739
These contain backports of the patches Reinette identified. Do these improve the situation?
--- Additional comment from reinette.chatre on 2010-03-23 16:17:34 EDT ---
(In reply to comment #13)
> Please try the test kernels here (when the build completes):
>
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739
>
> These contain backports of the patches Reinette identified. Do these improve
> the situation?
In addition to this Zhu Yi just created a patch to address the DMA warnings. This has not been pushed upstream yet, but if you are interested you can try out http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git;a=commit;h=7c9e64c19c02ab9f9450cceb2c2372143d3fa38e
--- Additional comment from linville on 2010-03-31 12:37:51 EDT ---
Any word on the kernels from comment 13? I don't know how much longer Koji will keep them available...
--- Additional comment from nbansal on 2010-04-09 12:49:28 EDT ---
These messages started to appear in messages when laptop lost wireless:
Apr 9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: time out after 500ms.
Apr 9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-110)
Then these:
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx
Apr 9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
Apr 9 21:22:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx
And after that:
Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
Apr 9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
Do we still have a test kernel. Please let me know if full logs and hardware details are needed.
--- Additional comment from linville on 2010-04-09 13:14:07 EDT ---
Nitin, what kernel are you using? Did you try the ones from comment 13?
--- Additional comment from nbansal on 2010-04-11 03:24:50 EDT ---
John, I am using:
kernel-firmware-2.6.32.10-90.fc12.noarch
kernel-2.6.32.10-90.fc12.x86_64
Looks like koji does not have test kernels any more.
--- Additional comment from linville on 2010-04-12 10:20:55 EDT ---
Updated test kernels building now:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2110994
--- Additional comment from nbansal on 2010-04-12 12:58:39 EDT ---
John, koji is displaying an error "BuildError: error building package (arch noarch), mock exited with status 1; see build.log for more information"
--- Additional comment from linville on 2010-04-12 13:57:15 EDT ---
Ugh -- Koji hiccup...
I think this one will make it:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2111335
--- Additional comment from linville on 2010-04-12 16:23:55 EDT ---
Build completed -- please test! :-)
--- Additional comment from nbansal on 2010-04-13 10:20:18 EDT ---
John, so far its good, was running on wireless whole day and it did not disconnect ( though after locking (ctrl+alt+L) when i tried to resume system response was very sluggish, I think it was due to compiz, after disabling compiz I cant reproduce that behavior ) .. let me know if you need some information from this system
--- Additional comment from updates on 2010-04-13 14:47:19 EDT ---
kernel-2.6.32.11-102.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.32.11-102.fc12