Bug 581996 - Wireless dies, followed by kernel DMA API warning when unloading the modele
Summary: Wireless dies, followed by kernel DMA API warning when unloading the modele
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 527824
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-13 19:33 UTC by John W. Linville
Modified: 2010-04-20 17:52 UTC (History)
12 users (show)

Fixed In Version: kernel-2.6.33.2-57.fc13
Clone Of: 527824
Environment:
Last Closed: 2010-04-20 17:52:11 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description John W. Linville 2010-04-13 19:33:47 UTC
I'm not entirely sure if this is a problem w/ F-13, but the patches that fixed it in F-12 seem to more-or-less apply to F-13...

+++ This bug was initially created as a clone of Bug #527824 +++

My wireless died mid-use.  I unloaded the module and then reloaded it to try to get it working ago.  Wireless card is an Intel Corporation Wireless WiFi Link 5300.  Here's the output from dmesg:

iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
wlan0: disassociating by local choice (reason=3)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Aborted scan still in progress after 100ms
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_ABORT_CMD: time out after 500ms.
iwlagn 0000:03:00.0: PCI INT A disabled
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:687 dma_debug_device_change+0x14b/0x192() (Not tainted)
Hardware name: 2777CTO
pci 0000:03:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=42]
Modules linked in: tun fuse rfcomm sco bridge stp llc bnep l2cap sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb iwlagn(-) snd_hda_codec_conexant iwlcore uvcvideo snd_hda_intel videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec mac80211 snd_hwdep joydev snd_pcm i2c_i801 cfg80211 snd_timer thinkpad_acpi hwmon iTCO_wdt iTCO_vendor_support btusb bluetooth e1000e snd rfkill soundcore snd_page_alloc wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode]
Pid: 13472, comm: rmmod Not tainted 2.6.31.1-56.fc12.x86_64 #1
Call Trace:
 [<ffffffff8106422c>] warn_slowpath_common+0x95/0xc3
 [<ffffffff810642e7>] warn_slowpath_fmt+0x50/0x66
 [<ffffffff8128e771>] ? dma_debug_device_change+0xd6/0x192
 [<ffffffff8128e7e6>] dma_debug_device_change+0x14b/0x192
 [<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e
 [<ffffffff81509a35>] notifier_call_chain+0x72/0xba
 [<ffffffff81086db1>] ? __blocking_notifier_call_chain+0x4c/0x8e
 [<ffffffff81086dc8>] __blocking_notifier_call_chain+0x63/0x8e
 [<ffffffff81086e1a>] blocking_notifier_call_chain+0x27/0x3d
 [<ffffffff81351bd7>] __device_release_driver+0xc3/0xde
 [<ffffffff81351ca2>] driver_detach+0xb0/0xe6
 [<ffffffff813509d6>] bus_remove_driver+0xb8/0x10d
 [<ffffffff81352524>] driver_unregister+0x7b/0x9a
 [<ffffffff81298032>] pci_unregister_driver+0x57/0xb7
 [<ffffffffa025b1a8>] iwl_exit+0x28/0x43 [iwlagn]
 [<ffffffff810a2667>] sys_delete_module+0x1e3/0x279
 [<ffffffff81505e96>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b
---[ end trace 0e80a0cf85f72dbf ]---

--- Additional comment from dcbw on 2009-10-07 16:03:02 EDT ---

Reinette thinks this is a dupe of:

http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037

--- Additional comment from fedora-triage-list on 2009-11-16 08:23:18 EST ---


This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from warlord on 2010-02-23 17:35:53 EST ---

There appear to be a bunch of issues with the Intel 5300 on F12.  I'm having tons of problems staying online.   I can unload and reload the module and sometimes it'll stay up for a little while and other times it'll die almost immediately.  I get a bunch of different errors along the way, such as the one listed above, and sometimes:

iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: Error setting new RXON (-28)
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: Error setting new RXON (-28)
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28


I tried building the compat-wireless drivers (2010-02-23) but then I lost access to my a/n access point so I backed down to the drivers in kernel 2.6.31.12-174.2.22.fc12.x86_64

--- Additional comment from warlord on 2010-02-23 17:43:12 EST ---

Also seeing:

iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
...

Which error I see seems to vary from run to run.  But it all winds up in the same condition -- I have to unload/reload the driver and then hope it will stay up long enough to do something before it dies again.

--- Additional comment from warlord on 2010-02-23 17:47:09 EST ---

And then like that error I then got:

iwlagn 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.
Registered led device: iwl-phy0::radio
Registered led device: iwl-phy0::assoc
Registered led device: iwl-phy0::RX
Registered led device: iwl-phy0::TX

and the network is unresponsive.  I'm reloading with modprobe iwlagn debug50=0x40000 to see if I can get more data.

--- Additional comment from warlord on 2010-02-23 17:58:48 EST ---

... but of course this hasn't given me any data.  :-(  It's crashed twice so far and I've seen no firmware error messages.

--- Additional comment from warlord on 2010-02-23 18:46:20 EST ---

Created an attachment (id=395852)
microcode debug output

Aha, I finally got a firmware crash.  Here's the dump I got into my dmesg log.  Hope this helps debug this issue; I'm tired to restarting my network ever 5-10 minutes!

--- Additional comment from linville on 2010-02-24 09:58:43 EST ---

This has been plaguing us to varying degrees for several kernel releases (both in Fedora and upstream)... :-(

--- Additional comment from warlord on 2010-02-24 10:09:01 EST ---

Well, I've certainly got a good test-case scenario at home... It's kinda keeping me from getting work done..  To the point where I'm considering buying a USB 802.11 token or running a very long ethernet cable.  My home AP is a WRT610N running dd-wrt and I'm connecting via 802.11(a/n) using WPA2/TKIP.  I'm certainly willing to help debug this anyway I can, including running test kernels or running in debug mode.  I seem to have found ways to tickle the bug pretty consistently, such that I can cause it to happen within about 15-30 minutes (at most -- sometimes even as short as 5min!)

So, John, is there anything I can do to help?

--- Additional comment from linville on 2010-02-24 10:31:40 EST ---

I hope so, but I'll rely on Reinette to advise.  The logs in comment 7 seem like they might be helpful.

Any reason you are using TKIP?  CCMP is generally better (i.e. more secure).  I wonder if using CCMP has any effect on reproduceability?  FWIW I see this irregularly on a WEP network here.

--- Additional comment from reinette.chatre on 2010-02-24 12:13:11 EST ---

(In reply to comment #10)
> I hope so, but I'll rely on Reinette to advise.  The logs in comment 7 seem
> like they might be helpful.

The logs still point to a bug that is haunting us also, http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037. The logs in comment 7 mention that the ucode error follows some other errors, but that log does not contain those other errors. Once there is a problem it does not really help much to trace ucode errors that occur when we are already in problem state.

--- Additional comment from reinette.chatre on 2010-03-19 18:33:13 EDT ---

(In reply to comment #11)
> The logs still point to a bug that is haunting us also,
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037. 

This bug report has been updated with some patches that address this issue. See http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037#c113 

Also see your own bug report https://bugzilla.redhat.com/show_bug.cgi?id=573029 which may be a duplicate of this one. I did update that bug report with some details similar to the update to the intellinuxwireless.org bug. See https://bugzilla.redhat.com/show_bug.cgi?id=573029#c16

--- Additional comment from linville on 2010-03-22 17:11:50 EDT ---

Please try the test kernels here (when the build completes):

http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739

These contain backports of the patches Reinette identified.  Do these improve the situation?

--- Additional comment from reinette.chatre on 2010-03-23 16:17:34 EDT ---

(In reply to comment #13)
> Please try the test kernels here (when the build completes):
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2068739
> 
> These contain backports of the patches Reinette identified.  Do these improve
> the situation?    

In addition to this Zhu Yi just created a patch to address the DMA warnings. This has not been pushed upstream yet, but if you are interested you can try out http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git;a=commit;h=7c9e64c19c02ab9f9450cceb2c2372143d3fa38e

--- Additional comment from linville on 2010-03-31 12:37:51 EDT ---

Any word on the kernels from comment 13?  I don't know how much longer Koji will keep them available...

--- Additional comment from nbansal on 2010-04-09 12:49:28 EDT ---

These messages started to appear in messages when laptop lost wireless:

Apr  9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: time out after 500ms.
Apr  9 21:15:34 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-110)


Then these:


Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx
Apr  9 21:20:30 localhost kernel: iwlagn 0000:0c:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
Apr  9 21:22:30 localhost kernel: iwlagn 0000:0c:00.0: No space for Tx


And after that:

Apr  9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: Error setting new RXON (-28)
Apr  9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!.  CSR_GP_CNTRL = 0xFFFFFFFF
Apr  9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!.  CSR_GP_CNTRL = 0xFFFFFFFF
Apr  9 21:44:26 localhost kernel: iwlagn 0000:0c:00.0: MAC is in deep sleep!.  CSR_GP_CNTRL = 0xFFFFFFFF

Do we still have a test kernel. Please let me know if full logs and hardware details are needed.

--- Additional comment from linville on 2010-04-09 13:14:07 EDT ---

Nitin, what kernel are you using?  Did you try the ones from comment 13?

--- Additional comment from nbansal on 2010-04-11 03:24:50 EDT ---

John, I am using:

kernel-firmware-2.6.32.10-90.fc12.noarch
kernel-2.6.32.10-90.fc12.x86_64

Looks like koji does not have test kernels any more.

--- Additional comment from linville on 2010-04-12 10:20:55 EDT ---

Updated test kernels building now:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2110994

--- Additional comment from nbansal on 2010-04-12 12:58:39 EDT ---

John, koji is displaying an error "BuildError: error building package (arch noarch), mock exited with status 1; see build.log for more information"

--- Additional comment from linville on 2010-04-12 13:57:15 EDT ---

Ugh -- Koji hiccup...

I think this one will make it:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2111335

--- Additional comment from linville on 2010-04-12 16:23:55 EDT ---

Build completed -- please test! :-)

--- Additional comment from nbansal on 2010-04-13 10:20:18 EDT ---

John, so far its good, was running on wireless whole day and it did not disconnect ( though after locking (ctrl+alt+L) when i tried to resume system response was very sluggish, I think it was due to compiz, after disabling compiz I cant reproduce that behavior ) .. let me know if you need some information from this system

--- Additional comment from updates on 2010-04-13 14:47:19 EDT ---

kernel-2.6.32.11-102.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.32.11-102.fc12

Comment 1 John W. Linville 2010-04-13 20:00:28 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=2113697

F-13 testers welcome! :-)


Note You need to log in before you can comment on or make changes to this bug.