Bug 493018

Summary: Intel 5300agn disconnects and does not work until the iwlagn module is reloaded
Product: [Fedora] Fedora Reporter: Bernie Innocenti <bernie+fedora>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: aurel.opiatra, brentley, cbm, dominik, dominik.stadler, emcnabb, george, kernel-maint, nbarriga, reinette.chatre, tcallawa, thesource, turgut, twillber, valtri, warlord
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-26 14:58:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/message with modprobe iwlagn debug50=0x43fff none

Description Bernie Innocenti 2009-03-31 11:10:59 UTC
The card occasionally hangs and then disconnects from the AP.  The only way to recover from this is "rmmod iwlagn; modprobe iwlagn".

The bug *seems* to trigger much more frequently while transferring large amounts of data in both directions, but I've seen it a couple of times when the wifi link was not particularly loaded.


Version-Release number of selected component (if applicable):
kernel-2.6.29-16.fc11.x86_64 
kernel-firmware-2.6.29-16.fc11.noarch

How reproducible:
Every few hours, more often under high load.

Steps to Reproduce:
1. Connect to an AP.
2. Run a couple of large rsync jobs in both directions
3. Wait...
  
Actual results:
Card hangs

Expected results:
Card should not hang, please

Additional info:


iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_LINK_QUALITY_CMD: enqueue_hcmd failed: -28
iwlagn: Read index for DMA queue txq_id (2) index 108 is out of range [0-256] 141 136
iwlagn: Read index for DMA queue txq_id (2) index 109 is out of range [0-256] 141 136
iwlagn: Read index for DMA queue txq_id (2) index 110 is out of range [0-256] 141 136
wlan0: No ProbeResp from current AP 00:18:84:27:85:81 - assume out of range
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_RXON: time out after 500ms.
iwlagn: Error setting new RXON (-110)
wlan0: direct probe to AP 00:18:84:27:85:81 try 1
wlan0: direct probe to AP 00:18:84:27:85:81 try 2
wlan0: direct probe to AP 00:18:84:27:85:81 try 3
wlan0: direct probe to AP 00:18:84:27:85:81 timed out
iwlagn: Error sending REPLY_RXON: time out after 500ms.
iwlagn: Error setting new RXON (-110)
iwlagn: Error sending REPLY_RXON: time out after 500ms.
iwlagn: Error setting new RXON (-110)
wlan0: direct probe to AP 00:18:84:27:85:81 try 1
wlan0: direct probe to AP 00:18:84:27:85:81 try 2
wlan0: direct probe to AP 00:18:84:27:85:81 try 3
wlan0: direct probe to AP 00:18:84:27:85:81 timed out
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_RXON: time out after 500ms.
iwlagn: Error setting new RXON (-110)
iwlagn: Error sending REPLY_RXON: time out after 500ms.
iwlagn: Error setting new RXON (-110)
wlan0: direct probe to AP 00:18:84:27:85:81 try 1
wlan0: direct probe to AP 00:18:84:27:85:81 try 2
wlan0: direct probe to AP 00:18:84:27:85:81 try 3
wlan0: direct probe to AP 00:18:84:27:85:81 timed out
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn: Error setting new RXON (-28)
iwlagn: No space for Tx
iwlagn: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn: Error setting new RXON (-28)
wlan0: direct probe to AP 00:18:84:27:85:81 try 1
wlan0: direct probe to AP 00:18:84:27:85:81 try 2
wlan0: direct probe to AP 00:18:84:27:85:81 try 3
wlan0: direct probe to AP 00:18:84:27:85:81 timed out
iwlagn: No space for Tx
iwlagn: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
iwlagn: No space for Tx
iwlagn: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 1.3.27kds
iwlagn: Copyright(c) 2003-2008 Intel Corporation
iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:03:00.0: setting latency timer to 64
iwlagn: Detected Intel Wireless WiFi Link 5300AGN REV=0x24
iwlagn: Tunable channels: 13 802.11bg, 24 802.11a channels
wmaster0 (iwlagn): not using net_device_ops yet
phy7: Selected rate control algorithm 'iwl-agn-rs'
wlan0 (iwlagn): not using net_device_ops yet
cfg80211: Calling CRDA for country: IT
iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:03:00.0: irq 30 for MSI/MSI-X
iwlagn 0000:03:00.0: firmware: requesting iwlwifi-5000-1.ucode
iwlagn loaded firmware version 5.4.1.16
Registered led device: iwl-phy7:radio
Registered led device: iwl-phy7:assoc
Registered led device: iwl-phy7:RX
Registered led device: iwl-phy7:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
wlan0: authenticate with AP 00:18:84:27:85:81
wlan0: authenticated
wlan0: associate with AP 00:18:84:27:85:81
wlan0: RX AssocResp from 00:18:84:27:85:81 (capab=0x421 status=0 aid=1)
wlan0: associated
wlan0: disassociating by local choice (reason=3)
wlan0: authenticate with AP 00:18:84:27:85:81
wlan0: authenticated
wlan0: associate with AP 00:18:84:27:85:81
wlan0: RX AssocResp from 00:18:84:27:85:81 (capab=0x421 status=0 aid=1)
wlan0: associated
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: no IPv6 routers present
thinkpad_acpi: EC reports that Thermal Table has changed

Comment 1 reinette chatre 2009-04-14 22:25:52 UTC
Which hardware is this?

Comment 2 reinette chatre 2009-04-14 22:26:59 UTC
(In reply to comment #1)
> Which hardware is this?  

ah .. sorry ... missed this in the title

we discovered a problem in interrupt handling of this device and are working on a patch to address it. This should resolve this issue.

Comment 3 John W. Linville 2009-05-05 15:11:06 UTC
Reinette, any news on the aforementioned patch?

Comment 4 reinette chatre 2009-05-05 21:50:01 UTC
John, it is currently being tested internally. We hope to make it public soon.

Comment 5 reinette chatre 2009-05-22 21:50:29 UTC
(In reply to comment #3)
> Reinette, any news on the aforementioned patch?  

These patches can now be found in wireless-testing. Not all of these are relevant to this specific issue, but they are all worth looking into as they address core issues with driver that result in subtle bugs (like this bug).

The major ones are:
commit 6269d02c2e8391775605933007af6df75a94716b
Author: Mohamed Abbas <mohamed.abbas>
Date:   Fri May 22 11:01:47 2009 -0700

    iwlcore: register locks

commit 2556889b30137053180e97febb84f943d74003b7
Author: Mohamed Abbas <mohamed.abbas>
Date:   Fri May 22 11:01:50 2009 -0700

    iwlcore: support ICT interrupt

commit c8881170b7ce9897a7b9087d37ca3f281b347707
Author: Mohamed Abbas <mohamed.abbas>
Date:   Fri May 22 11:01:51 2009 -0700

    iwlcore: Allow skb allocation from tasklet.

commit 8538087cb9dc0304e7884b76d4758b881ba7c6bd
Author: Mohamed Abbas <mohamed.abbas>
Date:   Fri May 22 11:01:52 2009 -0700

    iwlcore: Add support for periodic RX interrupt

commit 1a8b5e120d4f1131fdc8a1969568eecb82fd0246
Author: Mohamed Abbas <mohamed.abbas>
Date:   Fri May 22 11:01:53 2009 -0700

    iwlcore: Set rb_timeout to 0x10 for devices with ICT

Comment 6 Bug Zapper 2009-06-09 12:50:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 John W. Linville 2009-07-22 17:37:43 UTC
Bernie, can you try a rawhide kernel?

   yum --enablerepo=devel update kernel

Does that work better for you?

Comment 8 František Dvořák 2009-08-24 20:08:16 UTC
I've bumped into this bug too, and it looks like it's working fine now with kernel* from updates-testing (2.6.30.5-32.fc11.x86_64). So far 24 hours of the quite intensive networking without outage.

Comment 9 Chuck Ebbert 2009-08-24 21:46:10 UTC
Marking 'modified' because of report that 2.6.30.5 fixes this bug.

Comment 10 Nicolas A. Barriga 2009-10-03 22:32:00 UTC
I have this bug(or a similar looking one) in kernel 2.6.30.8-64.fc11.x86_64. This started about 6 weeks ago, but I cannot remember exactly with which kernel version.

Comment 11 Nicolas A. Barriga 2009-10-03 22:34:15 UTC
BTW, my card is an intel 5100agn.

Comment 12 Tom "spot" Callaway 2009-10-07 14:56:20 UTC
I'm seeing this bug (or something similar) on 2.6.31.1-56.fc12:

iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: Error sending REPLY_RXON: time out after 500ms.
iwlagn 0000:03:00.0: Error setting new RXON (-110)
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28

Comment 13 Tom "spot" Callaway 2009-10-07 14:56:59 UTC
03:00.0 Network controller: Intel Corporation Wireless WiFi Link 5300
	Subsystem: Intel Corporation Device 1011
	Physical Slot: 1
	Flags: bus master, fast devsel, latency 0, IRQ 35
	Memory at f4300000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: iwlagn
	Kernel modules: iwlagn

Lenovo Thinkpad T500, x86_64

Comment 14 filadel 2009-12-29 07:32:07 UTC
Same error, same controller, same Thinkpad.

System: Lenovo Thinkpad T500
Operating System: Fedora 12
Kernel: 2.6.31.9-174.fc12.x86_64
Controller: Intel Corporation Wireless WiFi Link 5300


iwlagn 0000:03:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: No space for Tx
iwlagn 0000:03:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
iwlagn 0000:03:00.0: Error setting new RXON (-28)
iwlagn 0000:03:00.0: No space for Tx

Comment 15 Dominik Sandjaja 2010-01-01 14:52:15 UTC
Seeing the same with a
 Network controller: Intel Corporation Wireless WiFi Link 5100
on kernel
 2.6.31.9-174.fc12.i686.PAE

This bug could be changed to F12.

From messages:
Jan  1 14:57:51 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:51 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:57:51 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
Jan  1 14:57:52 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:52 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:57:52 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:52 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:57:57 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:57 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:57:57 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
Jan  1 14:57:58 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:58 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:57:58 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:57:58 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:03 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:03 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:03 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
Jan  1 14:58:04 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:04 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:04 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:04 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:09 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:09 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:09 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
Jan  1 14:58:10 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:10 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:10 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:10 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:15 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:15 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:15 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms.
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms.
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-110)
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:16 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:20 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error setting new RXON (-28)
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: No space for Tx
Jan  1 14:58:25 miniserver kernel: iwlagn 0000:02:00.0: Error sending REPLY_TX_POWER_DBM_CMD: enqueue_hcmd failed: -28
...

Comment 16 Dominik Sandjaja 2010-01-03 08:55:16 UTC
In addition to my previous comment:

The problem seems to only appear under load, a rsync-based backup in my case. 

This makes Fedora unsuitable for my headless server, as I can't connect it by cable (which I would prefer).

Comment 17 Stanislaw Gruszka 2010-01-05 15:16:24 UTC
I'm trying to reproduce with 5300 adapter, no luck so far. Could you provide such information:
- network type (abg or n) and encryption
- how many devices are connected to the network
- how long does it take before disconnect
- any other relevant info for bug reproduce, you think may be useful

I'm not sure if I will be able to reproduce the bug at all, it can for example be a issue with some special APs. So I would like to get verbose debug messages from you. Enabling debug can be done by "rmmod iwlagn; modprobe iwlagn debug50=0x47f43fff" command. It's very noisy and can slow down machine and make but not reproducible. Some less verbose messages but still useful can be provided with debug mask 0x43fff. Please provide log from module load to connection dies. If it would be too big, from some time before error. Thanks.

Comment 18 Tom "spot" Callaway 2010-01-05 16:02:05 UTC
Reproducing this is... erratic. I'm not sure what causes it, I thought that a high network load would do the trick, but even that's not reliable. Sometimes I'll hit it multiple times in one day, sometimes not at all. :/ Maybe someone else will have a better reproducer.

Comment 19 Stanislaw Gruszka 2010-01-06 10:23:01 UTC
Ok. Can anyone else provide the debug messages? Info about wifi type and encryption and connected devices is also useful so please provide it.

Comment 20 Stanislaw Gruszka 2010-01-06 15:17:04 UTC
Looks this bug is duplicate of upstream bug:
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037

Comment 21 Dominik Sandjaja 2010-01-06 19:41:42 UTC
OK, I will try my best here ... The server is is headless, so it's a little hard, but here we go ...

The network is a 802.11b/g with a Linksys WRT54G2 as router/access point. There are only two laptops (xp/f12) and that server with the problem connected to it. All via wireless, no devices attached via ethernet.

I now tried out with debug 0x47f43fff and an rdiff-backup-session. I tried to get as much information as possible, but it didn't crash this time. Seems to be related to the speed, which was significantly lower compared to the runs without debugging. I will do a 0x43fff debug session now.

Comment 22 Dominik Sandjaja 2010-01-06 21:53:44 UTC
Here we go with debug level 0x43fff.

The file being transferred at the time of the crash was a several GB big file.

Last statistics from iptraf:
Total rates: 1293,4 kbytes/sec
Incoming rates: 1246,5 kbytes/sec
Outgoing rates: 46,8 kbytes/sec

The last lines from top:
top - 21:34:27 up 5 days,  5:57,  3 users,  load average: 0.60, 1.17, 1.16
Tasks: 121 total,   1 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s): 21.4%us, 11.8%sy,  0.0%ni, 42.3%id, 13.2%wa,  4.1%hi,  7.3%si,  0.0%st
Mem:   2051804k total,  1977712k used,    74092k free,   304208k buffers
Swap:  4128760k total,        0k used,  4128760k free,  1495832k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND                                                                           16836 ds        20   0 12360 3492 1464 S 34.6  0.2  10:41.58 sshd
 1088 root      20   0 40484 5356  944 S 27.9  0.3  58:35.44 rsyslogd
13266 root      20   0  5468 1992  848 S 13.5  0.1  10:44.76 iptraf
16837 ds        20   0 30376  23m 3048 S  2.9  1.2  15:36.21 rdiff-backup
16245 root      15  -5     0    0    0 S  1.9  0.0   0:19.02 iwlagn

The attached excerpt from /var/log/messages hopefully contains all relevant data, but earlier messages are the same as the first ones and the error messages continue like the ones which can be seen.

Comment 23 Dominik Sandjaja 2010-01-06 21:54:58 UTC
Created attachment 382095 [details]
/var/log/message with modprobe iwlagn debug50=0x43fff

The messages are logs when the iwlagn module was loaded with
modprobe iwlagn debug50=0x43fff

Comment 24 Stanislaw Gruszka 2010-01-07 14:16:45 UTC
Thanks for the logs, I'm looking at it. There are lots of others logs on Intel bugzilla 2037, which I didn't know about before. Unfortunately Intel (which have the best knowledge abut hardware, firmware and driver) were unable figure out the problem based on provided logs :( Anyway I will look at them and try to analyse.

Ideally if we can figure out how to reproduce. Does others who enter this issue have also WRT54, this router was mentioned by Dominik and at Intel bugzilla as well. Note: I'm trying to reproduce on my WRT610N on ag network, but all works fine.

Looks firmware bug is hard to figure out, current Intel effort is to make driver recover from firmware crash. Could You please apply the patch:
http://bugzilla.intellinuxwireless.org/attachment.cgi?id=2127
and see what happens. Was reported that patch does not help, but since then we can have some other changes in linux mac80211 stack that can make device do proper recovery. Please test patch with F11,F12,rawhide kernel or vanilla 2.6.31 or later kernel.

Comment 25 Stanislaw Gruszka 2010-01-08 08:41:01 UTC
(In reply to comment #24)
> Ideally if we can figure out how to reproduce. Does others who enter this issue
> have also WRT54, this router was mentioned by Dominik and at Intel bugzilla as
> well. Note: I'm trying to reproduce on my WRT610N on ag network, but all works
> fine

Tom, filadel, Nicolas, Bernie, please tell us what modem do You use.

Comment 26 Stanislaw Gruszka 2010-01-08 13:05:31 UTC
Nicolas in bug 530695 pointed that problem can be related with device overheat.

Could You run below simple script to log device temperature every one minute. We will see if problem happens if the temperature become high.

#!/bin/bash
while true ; do 
logger wlan0 temp `cat /sys/class/net/wlan0/device/temperature`
sleep 60
done

Change the logging frequency if you think some other value is more appropriate. If you have different device than wlan0 change it as well.

Comment 27 Tom "spot" Callaway 2010-01-08 15:38:28 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > Ideally if we can figure out how to reproduce. Does others who enter this issue
> > have also WRT54, this router was mentioned by Dominik and at Intel bugzilla as
> > well. Note: I'm trying to reproduce on my WRT610N on ag network, but all works
> > fine
> 
> Tom, filadel, Nicolas, Bernie, please tell us what modem do You use.  

When you say "modem", surely you mean "Wireless Access Point" ? :) I'll look when I get home, but it is a Linksys something-or-other. :)

Comment 28 filadel 2010-01-09 21:27:10 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > Ideally if we can figure out how to reproduce. Does others who enter this issue
> > have also WRT54, this router was mentioned by Dominik and at Intel bugzilla as
> > well. Note: I'm trying to reproduce on my WRT610N on ag network, but all works
> > fine
> 
> Tom, filadel, Nicolas, Bernie, please tell us what modem do You use.    

Hi evbdy

Wireless Access Point
Product Name:		ST585      aka SpeedTouch 585 with ADSL WAN
Software Release:	6.2.17.5

Comment 29 Stanislaw Gruszka 2010-01-13 13:39:01 UTC
*** Bug 530695 has been marked as a duplicate of this bug. ***

Comment 30 Stanislaw Gruszka 2010-01-14 08:13:18 UTC
Can someone confirm or deny this is temperature problem. Dominik maybe you, you seems to have reliable way to reproduce this bug? Please note that if this is overheating problem there is workaround for it (see bug 530695, workaround not work in 2.6.31 kernel, but is workable in 2.6.32 and <=2.6.30). Sometimes temperature can not be read (i.e. sensor is not connected), if so please try mentioned workaround and see if You are still able to reproduce.

Comment 31 Stanislaw Gruszka 2010-01-18 08:32:39 UTC
Reporters on Intel bugzilla deny that problem is related with temperature. Seems this is issue of pcie bus which become nonoperational for some reason (probably related with graphics cart type) and driver is unable to communicate with firmware. Unfortunately Intel is unable to reproduce the issue and as far nobody knows what can cause pcie bus fails.

To help solve this problem, please comment on
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037

Comment 32 Stanislaw Gruszka 2010-01-18 09:42:19 UTC
*** Bug 511128 has been marked as a duplicate of this bug. ***

Comment 33 Derek Atkins 2010-01-25 13:52:15 UTC
I've seen this a few times.

Looking at the dmesg logs I see a stacktrace due to a failed allocation and then the driver starts to fail sometime thereafter.  As others have reported, unloading and reloading the driver seems to work.  (I didn't reload with debugging this time because I only found this bug after the fact).

Hardware: Lenovo Thinkpad T500
OS:       Fedora-12 x86_64
Kernel:   kernel-2.6.31.9-174.fc12.x86_64
AP:       WRT610Nv2 w/ DD-WRT   (I'm using 5GHz AN with WPA2/TKIP)

03:00.0 Network controller: Intel Corporation Wireless WiFi Link 5300
	Subsystem: Intel Corporation Device 1011
	Flags: bus master, fast devsel, latency 0, IRQ 36
	Memory at f4200000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: iwlagn
	Kernel modules: iwlagn

I don't think this is temperature related...  The network died overnight last night (around 2am).  There was NOTHING happening around 2am on my laptop.

I've only seen this error happen twice since I've owned this laptop (which means 2 weeks), and I've seen both of these instances on my AN network.  Next time I'll try to get more debugging for you and intellinuxwireless.

Comment 34 Stanislaw Gruszka 2010-01-26 12:04:07 UTC
(In reply to comment #33)
> Looking at the dmesg logs I see a stacktrace due to a failed allocation and
> then the driver starts to fail sometime thereafter. 

If you see allocation failures, you may be affected by this bug 
https://bugzilla.redhat.com/show_bug.cgi?id=551937
Please comment there, if so.

Comment 35 Stanislaw Gruszka 2010-04-02 12:50:02 UTC
Hi 

Intel provided patches for driver recovery from firmware failures. Fedora kernel with backported patches is below, could you please test and see if it helps with that problem.
http://koji.fedoraproject.org/koji/taskinfo?taskID=2090837

Comment 36 Dominik Sandjaja 2010-04-02 21:06:48 UTC
I just checked out the kernel mentioned in comment #35 and I didn't get any of the problems I had before! Whatever was changed, at least for me it did the trick. Obviously I will do some more stress tests but so far it looks really well. Thanks.

Comment 37 Bernie Innocenti 2010-04-03 00:51:30 UTC
(In reply to comment #36)
> I just checked out the kernel mentioned in comment #35 and I didn't get any of
> the problems I had before! Whatever was changed, at least for me it did the
> trick. Obviously I will do some more stress tests but so far it looks really
> well. Thanks.    

Sorry, I can't test as I no longer own an Intel 5300agn card.

Comment 38 Bug Zapper 2010-04-27 13:22:45 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 39 Stanislaw Gruszka 2010-04-27 13:44:02 UTC
Bug is still present in Fedora 12

Comment 40 The Source 2010-04-27 13:52:04 UTC
I'm using 2.6.32.11-105.fc12.x86_64 and everything seems to be fine. Problem didn't appear for a long time and with several previous kernels too.

Comment 41 Stanislaw Gruszka 2010-05-14 13:26:49 UTC
We have to assure all needed patches goes also to F13 and F14, and we can close this bug. Thanks for info.

Comment 42 Stanislaw Gruszka 2010-05-26 14:58:11 UTC
F-13 has needed patches applied. Rawhide is upstodate with upstream, so it have this issue fixed too.

Closing according to comment 40.