Bug 708747 - Extremely slow network with Intel "Ultimate N WiFi Link 5300" (iwlagn) after upgrade from Fedora 14
Summary: Extremely slow network with Intel "Ultimate N WiFi Link 5300" (iwlagn) after...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
: 736374 (view as bug list)
Depends On:
Blocks: 735721 804259
TreeView+ depends on / blocked
 
Reported: 2011-05-29 10:12 UTC by Håvard Wigtil
Modified: 2012-04-15 21:42 UTC (History)
21 users (show)

Fixed In Version: kernel-2.6.40.4-5.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 735721 804259 (view as bug list)
Environment:
Last Closed: 2011-09-09 08:33:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Kernel patch to workaround this issue (533 bytes, patch)
2011-08-23 23:25 UTC, Adam Williamson
no flags Details | Diff

Description Håvard Wigtil 2011-05-29 10:12:26 UTC
Description of problem:
After upgrading from Fedora 14, wireless network is so slow that it's literally unusable. Even simple web pages time out before they can load, can't check mail, etc. Wired networking on upgraded machine works as normal, as do the wireless network from other devices.

Version-Release number of selected component (if applicable):
kernel-2.6.38.6-27.fc15.x86_64

How reproducible: always


Steps to Reproduce:
1. Connect to wireless network
2. Access any network-based service
  
Actual results:
Timeouts or extremely slow responses


Expected results:
Internet!

Additional info:
I'd be happy to provide any extra information you may need. I'm connecting to a D-Link DAP-1522.

Output from lspci:
03:00.0 Network controller: Intel Corporation Ultimate N WiFi Link 5300
	Subsystem: Intel Corporation Device 1011
	Physical Slot: 1
	Flags: bus master, fast devsel, latency 0, IRQ 49
	Memory at f4300000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 00-16-ea-ff-ff-e5-76-72
	Kernel driver in use: iwlagn
	Kernel modules: iwlagn

Output from iwconfig:
wlan0     IEEE 802.11abgn  ESSID:"<masked>"  
          Mode:Managed  Frequency:2.412 GHz  Access Point: <masked>   
          Bit Rate=58.5 Mb/s   Tx-Power=15 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=65/70  Signal level=-45 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:254  Invalid misc:35   Missed beacon:0

Comment 1 Håvard Wigtil 2011-06-02 09:12:51 UTC
This only seems to be a problem when connecting to a wireless N network. I don't see this when connecting to several G-based networks.

Comment 2 wey-yi.w.guy 2011-06-06 14:20:11 UTC
did you see firmware reload in dmesg? also, what firmware version you are using?

Thanks
Wey

Comment 3 Håvard Wigtil 2011-06-13 09:59:44 UTC
Sorry for the late reply, I haven't had the computer together with the N network for a while. Here's what I see i dmesg:

[  154.805031] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d
[  154.805039] iwlagn: Copyright(c) 2003-2010 Intel Corporation
[  154.805219] iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[  154.805236] iwlagn 0000:03:00.0: setting latency timer to 64
[  154.805336] iwlagn 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0x24
[  154.825947] iwlagn 0000:03:00.0: device EEPROM VER=0x11e, CALIB=0x4
[  154.825954] iwlagn 0000:03:00.0: Device SKU: 0Xb
[  154.825960] iwlagn 0000:03:00.0: Valid Tx ant: 0X7, Valid Rx ant: 0X7
[  154.848041] iwlagn 0000:03:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
[  154.848167] iwlagn 0000:03:00.0: irq 48 for MSI/MSI-X
[  154.893660] iwlagn 0000:03:00.0: loaded firmware version 8.83.5.1 build 33692

Comment 4 Daryll 2011-06-13 21:40:32 UTC
I just installed F15 and seem to be having the same problem with my Intel 1000N. If I tell the router to do BG only, it works fine, but if allow N the performance is so bad as to be unusuable.

Comment 5 Stanislaw Gruszka 2011-06-15 11:01:04 UTC
So you are using the latest firmware. Let's try the latest driver version :-)
Please test compat-wireless-next from http://people.redhat.com/sgruszka/compat_wireless.html . It contains tx power setting bug fix. This bug can manifest itself as bad tx performance.

Comment 6 Håvard Wigtil 2011-06-20 20:05:40 UTC
Apologies for the late reply, I've been away for a few days. I've installed kmod-compat-wireless-next-2011_06_14-0.fc15.2.x86_64 and rebooted, but the problem appears as before. Here's from dmesg, this includes a change to the problematic network after login:

[   19.355037] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d
[   19.355045] iwlagn: Copyright(c) 2003-2011 Intel Corporation
[   19.355231] iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   19.355248] iwlagn 0000:03:00.0: setting latency timer to 64
[   19.355353] iwlagn 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0x24
[   19.375893] iwlagn 0000:03:00.0: device EEPROM VER=0x11e, CALIB=0x4
[   19.375900] iwlagn 0000:03:00.0: Device SKU: 0Xb
[   19.375936] iwlagn 0000:03:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
[   19.376081] iwlagn 0000:03:00.0: irq 49 for MSI/MSI-X
[   19.398003] iwlagn 0000:03:00.0: loaded firmware version 8.83.5.1 build 33692
[   23.249178] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   30.111953] wlan0: authenticate with 00:22:07:0a:fa:a2 (try 1)
[   30.114964] wlan0: authenticated
[   30.128999] wlan0: associate with 00:22:07:0a:fa:a2 (try 1)
[   30.131596] wlan0: RX AssocResp from 00:22:07:0a:fa:a2 (capab=0x411 status=0 aid=2)
[   30.131601] wlan0: associated
[   30.148143] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   40.626273] wlan0: no IPv6 routers present
[   73.314721] wlan0: deauthenticating from 00:22:07:0a:fa:a2 by local choice (reason=3)
[   76.189813] wlan0: authenticate with f0:7d:68:fd:39:94 (try 1)
[   76.191548] wlan0: authenticated
[   76.206506] wlan0: associate with f0:7d:68:fd:39:94 (try 1)
[   76.215434] wlan0: RX AssocResp from f0:7d:68:fd:39:94 (capab=0xc31 status=0 aid=2)
[   76.215443] wlan0: associated
[   87.563750] iwlagn 0000:03:00.0: Tx aggregation enabled on ra = f0:7d:68:fd:39:94 tid = 0

Comment 7 Stanislaw Gruszka 2011-06-22 20:16:17 UTC
Hmm, this need to be more investigated ... I think using module parameter 11n_disable=1 should workaround problem here.

Comment 8 Dawid Lorenz 2011-08-05 13:51:37 UTC
I am not sure whether this is related, however after upgrading kernel to 2.6.40-4.fc15.x86_64, my Intel 1000N wireless adapter has started crashing my home router (TP-Link TL-WR1043ND with dd-wrt firmware) as soon as it gets associated and some heavier network traffic starts. For example, pinging router's local IP address would work fine but when I try to load a website, router immediately freezes to the point I need to switch its power off.

I couldn't find anything useful that could suggest a reason in either /var/log/messages on my laptop as well as router's internal syslog. It just silently fails.

Workaround for this is either use 11n_disable=1 driver option or put router into 802.11bg mode. Also, it used to work fine on 802.11n standard with no noticeable performance problems  with 2.6.38.8-35.fc15.x86_64 kernel.

Comment 9 Dawid Lorenz 2011-08-05 13:53:30 UTC
There are additional reports of this behaviour in this thread:
http://www.dd-wrt.com/phpBB2/viewtopic.php?t=140461

Comment 10 wey-yi.w.guy 2011-08-05 15:02:44 UTC
yes, we believe we also can reproduce the failure in-house here and we have engineer looking into this now. important and high priority bug for us.

Thanks
Wey

Comment 11 Adam Williamson 2011-08-10 05:19:30 UTC
+1!

Bought a wndr-3700 today, installed dd-wrt on it, nearly smashed the thing against the wall because of this bug...

Comment 12 Don Fry 2011-08-18 16:12:38 UTC
I have tracked down the cause for the dd-wrt crash.  The crash is caused by a commit to 2.6.38-rc1+  To fix:

Reverting the commit by Johannes 9b7688328422b88a7a15dc0dc123ad9ab1a6e22d will
fix the problem from my testing with a Netgear DGN3500.  If you can comment out
the line in iwl-agn.c iwl_mac_setup_register() which says:

hw->max_tx_aggregation_subframes = LINK_QUAL_AGG_FRAME_LIMIT_DEF;

I do not know if this will fix the general slow response, but it is worth testing.  Please let me know how this affects the original problem.

Comment 13 Vincent Batts 2011-08-18 20:48:58 UTC
(In reply to comment #12)
> I have tracked down the cause for the dd-wrt crash.  The crash is caused by a
> commit to 2.6.38-rc1+  To fix:
> 
> Reverting the commit by Johannes 9b7688328422b88a7a15dc0dc123ad9ab1a6e22d will
> fix the problem from my testing with a Netgear DGN3500.  If you can comment out
> the line in iwl-agn.c iwl_mac_setup_register() which says:
> 
> hw->max_tx_aggregation_subframes = LINK_QUAL_AGG_FRAME_LIMIT_DEF;
> 
> I do not know if this will fix the general slow response, but it is worth
> testing.  Please let me know how this affects the original problem.

I have tried the revert of this commit on linux 3.0.3, and confirm it allows me to connect to the wireless access point, with out causing the access point to become unresponsive.

Although, the first connection, when I pulled a large tarball as a test (`wget ftp://ftp.kernel.org/pub/linux/kernel/v3.0/linux-3.0.3.tar.gz`) the connection stalled out after 5mb of progress. I brought down the interface, brought it up again, re-established the WPA authentication, etc., and on the second connection, I was able to successfully pull the entire 74M tarball.

Comment 14 wey-yi.w.guy 2011-08-18 21:09:18 UTC
thank you for testing. We still try to understand why this commit cause the problem. once we root csause the problem, we will submit patch to fix it.

Wey

Comment 15 Adam Williamson 2011-08-19 16:43:01 UTC
I'm at LinuxCon so I won't be able to test this for a bit, but can it really be a commit made to a 2.6.38 rc? I'm using 2.6.38.8 as a way to workaround this problem, so it seemed like this must have been caused by a 2.6.39 or 3.0.0 change (I don't have a 2.6.39 kernel to try).

Anyway, I'll test the proposed fix later. Thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Vincent Batts 2011-08-20 01:33:48 UTC
Adam, the setter of "hw->max_tx_aggregation_subframes" is not present in 2.6.38.8, nor in 2.39.6

Comment 17 Vincent Batts 2011-08-20 01:38:56 UTC
err. I did the same slip-up, and pulled 2.6.29.6 
It for sure is not in 2.6.38.8, and after pulling 2.6.39.4, I've found that it *is* present there. I am building it to verify the same behavior.

Take care,
vb

Comment 18 Adam Williamson 2011-08-20 05:25:56 UTC
ah, I see, so the offending code was committed in 2.6.38-rc1 but nothing called it till 2.6.39...makes sense, I guess.

Comment 19 Stanislaw Gruszka 2011-08-22 15:22:49 UTC
(In reply to comment #14)
> thank you for testing. We still try to understand why this commit cause the
> problem. once we root csause the problem, we will submit patch to fix it.

If commit 9b7688328422b88a7a15dc0dc123ad9ab1a6e22d not cause iwlwifi device do something that break 802.11 specification, it is ok. Bug in *WRT should be fixed, since it crash. It should be fixed, even if iwlwifi do something that do not conform 802.11, as security denial of service issue.

Someone should pass information about "bad" commit to *WRT developers, to help them fix bug on their site.

Also this crash problem seems to be slightly related to performance issues originally reported here in comment 0.

Comment 20 wey-yi.w.guy 2011-08-22 17:53:03 UTC
Agree, but we still need to root cause the reason on both sides and make sure iwlwifi do the right thing.

Wey

Comment 21 Adam Williamson 2011-08-22 18:36:20 UTC
and, seriously, from a practical standpoint, DoSing probably the single most popular third-party router firmware isn't really a smart thing to do, whether it's an 802.11 compliance issue or not...

I'm going to test the fix in a sec.

Comment 22 Adam Williamson 2011-08-22 18:37:20 UTC
note that dd-wrt 'fixes' are problematic because most dd-wrt users do not use the latest code, as it often has instabilities or regressions; there are recommended, 'known good' versions on the dd-wrt site, and indeed it's sometimes hard to get support if you use a newer version than the known-good.

Comment 23 Adam Williamson 2011-08-22 20:36:39 UTC
Confirming Vincent's result, patching the current F15 kernel git (3.0.3) to comment out the specified line seems to resolve the issue. I'm able to transfer large amounts of data over the local network at speeds of 14MB/sec, too.

Comment 24 Dave Jones 2011-08-22 21:11:01 UTC
added the revert for the next f15 build.
We should probably add it in f16 too, lacking a better fix, unless the Intel guys have any better ideas ?

Comment 25 Dawid Lorenz 2011-08-22 21:46:45 UTC
(In reply to comment #24)
> added the revert for the next f15 build.

Does that mean the next stock kernel update for F15 should have that patch applied?

Comment 26 Dave Jones 2011-08-22 21:50:18 UTC
Yes.

Comment 27 Vincent Batts 2011-08-23 02:31:07 UTC
In response to Wey-yi, I too would like to help find a root cause. True enough, that reverting that line does allow the connection not to die off.

For testing sake, I saw that the legacy driver had LINK_QUAL_AGG_FRAME_LIMIT_DEF set to (31), instead of (63) as the iwl-agn driver does. Neither of these values work, but leaving max_tx_aggregation_subframes to its default (0) does work.

This max_tx_aggregation_subframes is not used, but one other place in the kernel, and it is not assigning anything to it. It is getting the value of it. 

Also, as a further note, I have upgraded my router the latest available firmware, from it's manufacturer (ActionTec). It did not assist any. I sent this report to them. They responded that they would notify their developers, *BUT* the ownership of any fixes to the firmware, would have to come from the OEM of the device, which is Verizon. That the report should go to them instead. 
I can find no such place to submit a report to Verizon's OEM router firmware team, and judging by personal experience think it would be a cold day, before they would take action on such.


Take care,
vb

Comment 28 Adam Williamson 2011-08-23 02:38:55 UTC
vincent: so, your router isn't actually running dd-wrt? or are actiontec / verizon using dd-wrt?

Comment 29 Vincent Batts 2011-08-23 12:48:55 UTC
Adam: The router I have, is the standard issue, from Verizon for Fios, ActionTect MI424-WR. For the past year it has been using its firmware version 20.10.7.5. After having these difficulties, I updated its firmware to 20.19.8, using the builtin utility on its webmin.
I have done no sort of custom flashing to this device.

Comment 30 Adam Williamson 2011-08-23 16:29:31 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 31 Adam Williamson 2011-08-23 23:25:31 UTC
Created attachment 519535 [details]
Kernel patch to workaround this issue

This patch works around the issue.

Comment 32 Stanislaw Gruszka 2011-08-24 09:25:16 UTC
(In reply to comment #21)
> and, seriously, from a practical standpoint, DoSing probably the single most
> popular third-party router firmware isn't really a smart thing to do, whether
> it's an 802.11 compliance issue or not...

I thought work around WRT problem can cause other problems with driver, but seems to be fully safe to do iwlwifi change. Actually it looks that it could be really a problem in iwlwifi driver, i.e. it send more subframes in aggregate frame than it advertise, what can overflows AP buffer.

Comment 33 Josh Boyer 2011-08-24 12:42:57 UTC
I added this to f16 and rawhide as well.  The next builds there will contain this.

Comment 34 Sune Mølgaard 2011-08-31 19:28:11 UTC
Ubuntu user here.

My gf has a Buffalo router running dd-wrt, and I upgraded to 11.04 on her connection, seeing the router crash shortly after booting up the upgraded system.

What is interesting, however, is that *I ran the same kernel* before and after the upgrade - to the best of my knowledge, it was 2.6.39, possibly rc-something.

This leads me to believe that a (then) new version of wpa_supplicant or network-manager is at least partially responsible, possibly triggering the code above, where earlier versions didn't...

Just my two cents,

Sune Mølgaard

Comment 35 Fedora Update System 2011-09-01 11:06:33 UTC
kernel-2.6.40.4-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.4-5.fc15

Comment 36 ventero 2011-09-01 12:02:34 UTC
Even after applying the patch from comment #31 to 3.0.3, I'm still able to crash my TP-Link TL-WR1043ND with stock firmware.
The crash happens whenenver I create a lot of traffic (e.g. by using iperf) for a longer period of time (about 50-60 seconds, but the exact time varies).

Comment 37 Adam Williamson 2011-09-01 16:43:01 UTC
That sounds like it may be a different bug, ventero - the symptom all of us see for the bug reported here, and apparently fixed by the patch, is the router going down as soon as virtually any traffic is transmitted.

Comment 38 Håvard Wigtil 2011-09-04 21:31:32 UTC
I've tested kernel-2.6.40.4-5.fc15.x86_64, and the issue as *originally* *reported* persists. The "router kill" problems that first appeared in comment #8 is most likely another issue, as I never had any problems with the wireless router, and it still works for other devices at the same time that I see these problems in Fedora 15.

Comment 39 Stanislaw Gruszka 2011-09-05 07:47:43 UTC
I cloned it to 735721 as we started to track router hung here.

Comment 40 Fedora Update System 2011-09-07 00:00:40 UTC
kernel-2.6.40.4-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Dawid Lorenz 2011-09-07 13:09:52 UTC
Just upgraded to 2.6.40.4-5.fc15.x86_64, rebooted machine without iwlagn 11n_disable=1 option and it seems to work fine so far, at least my router didn't freeze yet (it used to freeze within just seconds after connecting and starting some traffic over wifi).

I'll report here if I spot any other issues.

Comment 42 Stanislaw Gruszka 2011-09-08 15:04:53 UTC
*** Bug 736374 has been marked as a duplicate of this bug. ***

Comment 43 bung 2011-09-09 21:32:54 UTC
(In reply to comment #41)
> Just upgraded to 2.6.40.4-5.fc15.x86_64, rebooted machine without iwlagn
> 11n_disable=1 option and it seems to work fine so far, at least my router
> didn't freeze yet (it used to freeze within just seconds after connecting and
> starting some traffic over wifi).
> 
> I'll report here if I spot any other issues.

What does iwconfig say regarding Tx excessive retries and Invalid misc, respectively?

Comment 44 Dawid Lorenz 2011-09-21 17:12:37 UTC
(In reply to comment #41)
> Just upgraded to 2.6.40.4-5.fc15.x86_64, rebooted machine without iwlagn
> 11n_disable=1 option and it seems to work fine so far, at least my router
> didn't freeze yet (it used to freeze within just seconds after connecting and
> starting some traffic over wifi).
> 
> I'll report here if I spot any other issues.

OK, so after couple of weeks I can say that there's still something wrong with wireless "n" mode. Not sure if I should report it here or in #735721, but anyway - since I've re-enabled "n" mode in the driver, my WLAN router no longer freezes as described previously, however I am experiencing intermittent performance issues where wireless gets slow as hell and virtually unusable, to the point where router just crashes and reboots by itself. Pinging WLAN router results in massive packet loss and long response times:

adl@v3350 ~$ ping tplink.adlnet
PING tplink.adlnet (192.168.0.254) 56(84) bytes of data.
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=1 ttl=64 time=1535 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=2 ttl=64 time=2502 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=3 ttl=64 time=4922 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=4 ttl=64 time=5420 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=5 ttl=64 time=6123 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=6 ttl=64 time=6176 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=7 ttl=64 time=6302 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=8 ttl=64 time=6187 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=9 ttl=64 time=5833 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=11 ttl=64 time=10492 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=13 ttl=64 time=11581 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=17 ttl=64 time=10931 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=20 ttl=64 time=10296 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=21 ttl=64 time=9498 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=22 ttl=64 time=9349 ms
64 bytes from tplink.adlnet (192.168.0.254): icmp_req=24 ttl=64 time=7604 ms
^C
--- tplink.adlnet ping statistics ---
32 packets transmitted, 16 received, 50% packet loss, time 33179ms
rtt min/avg/max/mdev = 1535.332/7172.330/11581.424/2872.532 ms, pipe 12


This problem is intermittent - happens totally randomly at various times of the day (or night). Sometimes just forcing reconnect via NetworkManager seem to work around the issue and things get back to normal, but sometimes I wait until router surrenders and reboots by itself, so the subsequent connection is working stable again. 

Nonetheless, I've switched "n" mode off again for few days and no such issue occurred, so it's still somehow related with "n" mode.

/var/log/messages doesn't say anything interesting, maybe apart from things like:
iwlagn 0000:09:00.0: Aggregation not enabled for tid 6 because load = 3

But I'm not certain if that's related.


Note You need to log in before you can comment on or make changes to this bug.