Bug 504316
Summary: | Wireless speed drops | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Georgi Hristozov <hristozov> | ||||||||
Component: | kernel | Assignee: | Stanislaw Gruszka <sgruszka> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 11 | CC: | itamar, jieshengz, kernel-maint, linville, marc, reinette.chatre | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-04-06 11:11:36 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Georgi Hristozov
2009-06-05 14:57:37 UTC
Sounds like there might be a problem with the iwl3945 rate control algorithm... Please try the following upstream patches: commit a5f847d75e50f89d9c524274780ee5c0b99d96b7 Author: Abhijeet Kolekar <abhijeet.kolekar> Date: Wed Mar 11 11:17:56 2009 -0700 iwl3945 : fix rate scaling commit 82127493a656f6293ffb1566410b5753f29991ef Author: Abhijeet Kolekar <abhijeet.kolekar> Date: Tue Mar 17 21:51:51 2009 -0700 iwl3945: control rate decrease As far as I can see, the patches are included in 2.6.30. I compared the diffs roughly and they seem to be in the kernel source. I compiled it, with the config from the f11 kernel, and the bug persists. I intentionally overloaded the network link and the speed dropped again. The wlan settings seemed ok after the crash, the bit rate was still 54Mb/s, but the actual speed was terrible. wlan0 IEEE 802.11abg ESSID:"gh" Mode:Managed Frequency:2.462 GHz Access Point: 00:1D:7E:C6:DB:03 Bit Rate=54 Mb/s Tx-Power=15 dBm Retry min limit:7 RTS thr:off Fragment thr:off Power Management:off Link Quality=57/70 Signal level=-53 dBm Noise level=-127 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 Here's the result of pinging the gateway: 7 packets transmitted, 6 received, 14% packet loss, time 6093ms rtt min/avg/max/mdev = 684.297/736.163/822.311/51.182 ms I don't know whether it's connected with the problem, but Firefox (and, as it seems, crond too) crashed at the exactly the same moment when the speed dropped. The system hanged for nearly a whole minute. crond invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 crond cpuset=/ mems_allowed=0 Pid: 3098, comm: crond Not tainted 2.6.30devtest #1 Call Trace: [<c0484e09>] oom_kill_process+0x6e/0x1f2 [<c04853b8>] ? select_bad_process+0x87/0xce [<c0485472>] __out_of_memory+0x73/0x82 [<c04854e4>] out_of_memory+0x63/0x88 [<c04877d7>] __alloc_pages_internal+0x2e8/0x38a [<c0489365>] __do_page_cache_readahead+0xb7/0x16e [<c0489462>] do_page_cache_readahead+0x46/0x50 [<c04832ae>] filemap_fault+0x14d/0x320 [<c0495262>] __do_fault+0x52/0x385 [<c04261a5>] ? kmap_atomic_prot+0xfc/0x11b [<c04958c2>] handle_mm_fault+0x32d/0x7a3 [<c0727abf>] do_page_fault+0x210/0x21f [<c07278af>] ? do_page_fault+0x0/0x21f [<c0726102>] error_code+0x72/0x78 [<c07278af>] ? do_page_fault+0x0/0x21f Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 163 CPU 1: hi: 186, btch: 31 usd: 154 HighMem per-cpu: CPU 0: hi: 186, btch: 31 usd: 185 CPU 1: hi: 186, btch: 31 usd: 96 Active_anon:305401 active_file:36 inactive_anon:161383 inactive_file:1944 unevictable:2 dirty:0 writeback:1638 unstable:0 free:12205 slab:7851 mapped:40 pagetables:3298 bounce:0 DMA free:8088kB min:64kB low:80kB high:96kB active_anon:640kB inactive_anon:684kB active_file:4kB inactive_file:340kB unevictable:0kB present:15868kB pages_scanned:960 all_unreclaimable? no lowmem_reserve[]: 0 865 2007 2007 Normal free:40248kB min:3728kB low:4660kB high:5592kB active_anon:356560kB inactive_anon:356680kB active_file:80kB inactive_file:2084kB unevictable:0kB present:885944kB pages_scanned:35938 all_unreclaimable? no lowmem_reserve[]: 0 0 9134 9134 HighMem free:484kB min:512kB low:1740kB high:2972kB active_anon:864404kB inactive_anon:288168kB active_file:60kB inactive_file:5352kB unevictable:8kB present:1169232kB pages_scanned:5408 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 0*8kB 1*16kB 2*32kB 1*64kB 2*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8088kB Normal: 478*4kB 332*8kB 164*16kB 124*32kB 74*64kB 40*128kB 19*256kB 6*512kB 5*1024kB 1*2048kB 1*4096kB = 40216kB HighMem: 20*4kB 6*8kB 1*16kB 1*32kB 1*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB 40308 total pagecache pages 154 pages in swap cache Swap cache stats: add 141839, delete 141685, find 3052/4137 Free swap = 0kB Total swap = 524280kB 521936 pages RAM 294610 pages HighMem 24900 pages reserved 39711 pages shared 444189 pages non-shared Out of memory: kill process 5535 (run-mozilla.sh) score 4542560 or a child Killed process 5547 (firefox) There weren't other applications running on the notebook, except Skype and a few consoles. Seems pretty strange on a machine with 2GB RAM. (In reply to comment #3) > > Here's the result of pinging the gateway: > > 7 packets transmitted, 6 received, 14% packet loss, time 6093ms > rtt min/avg/max/mdev = 684.297/736.163/822.311/51.182 ms This looks really bad. Did you take this capture before or after the speed dropped? > > I don't know whether it's connected with the problem, but Firefox (and, as it > seems, crond too) crashed at the exactly the same moment when the speed > dropped. The system hanged for nearly a whole minute. > > crond invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 If your system is running out of memory it will affect everything running on that system. > There weren't other applications running on the notebook, except Skype and a > few consoles. Seems pretty strange on a machine with 2GB RAM. Potentially a memory leak on something running on that machine. (In reply to comment #4) > (In reply to comment #3) > > > > Here's the result of pinging the gateway: > > > > 7 packets transmitted, 6 received, 14% packet loss, time 6093ms > > rtt min/avg/max/mdev = 684.297/736.163/822.311/51.182 ms > > This looks really bad. Did you take this capture before or after the speed > dropped? > Yes, this is after the speed dropped. Normally it is something like: 37 packets transmitted, 36 received, 2% packet loss, time 36058ms rtt min/avg/max/mdev = 0.995/1.228/3.335/0.423 ms > Potentially a memory leak on something running on that machine. Well, I hope it is a coincidence. :) Normally this bug doesn't affect the performance of the system, only the network. Are you still seeing this issue with the latest F11 kernels? What is the latest kernel you have tried? The problem persists with 2.6.30.5-43.fc11.x86_64. Today I compiled and tested the new 2.6.31 i686 kernel on the same hardware, this time running Slackware (I know it's not Red Hat related, but the problem seems to be reproducible in Fedora and you are upstream developers as well). The speed dropped again, in exactly the same way. I decided to turn on IWLWIFI_DEBUG in order to collect debugging info. I've checked IWLWIFI_SPECTRUM_MEASUREMENT, IWLWIFI_LEDS, IWL3945_SPECTRUM_MEASUREMENT as well, although they seem completely irrelevant to the rate scaling. After recompiling, the issue seems to have disappeared. In the last few hours I've transferred more than 20GB with stable speed of ~20-24MBit. Tonight I'm going to further play with the options, hoping to reproduce the problem. By the way, can you please recommend me a debug mask which can produce useful info about the problem? I've tried just setting all the bits, but it's producing a huge amount of information and syslogd and klogd are under big pressure (60% CPU usage). I'm sticking to 0x00FF0000 for now (I think that IWL_DL_RF_KILL and IWL_DL_RATE are especially interesting), but I quite don't know what I'm doing, so I will appreciate some advice. :) I've just managed to reproduce the problem without touching the configuration. Seems that the additional config options do not matter. The log can be found at http://forkbomb.nl/pub/temp/iwl_msg.txt . Please note that the speed drop happened around 16:48 and after that the driver stopped outputting anything at all in syslog. It became responsive again after reconnecting to the AP. Georgi, Just to clarify: problem is fixed in vanilla 2.6.31, right? Using just IWL_DL_RATE should be fine to debug, is problem is really in rate control algorithm. Turning on all flags except IWL_DL_TX_REPLY, IWL_DL_RX, IWL_DL_TX, IWL_DL_INFO should give good results. But if still some flags generate too much output and mess up things instead of giving any useful information, you can just disable this flag and try again. File you provide in http://forkbomb.nl/pub/temp/iwl_msg.txt is somehow corrupted (?), I can not see any interesting things there. To exclude that problem is related with memory leakage/fragmentation you could check output of /proc/buddyinfo and /proc/slabinfo . If output of these files are much different when speed drops happens and after driver restart to normal functionality, that would suggest problem is somewhere there. No, the problem is reproducible on vanilla 2.6.31, in exactly the same way as I've written in the beginning of my bug report. I didn't make this clear enough, sorry. :) Anyway, I triggered the bug again. This time I've collected the log from the right place and using debug_level 0xBE7FFFFE (turned off the bits you told me). It's pretty big because of the somewhat "liberal" mask, but I hope it can be useful to you. The last message in the file is just ~10 seconds after the problem occurred. I've monitored slabinfo too, but nothing suspicious changed. The system has 2GB RAM, has been up for just a few hours when the speed dropped, and only KDE with two text editors and two shells were running. So I think it's probably not a memory-related issue. The log can be found at http://forkbomb.nl/pub/temp/log.tar.bz2 (843K compressed, over 100MB uncompressed) I've uploaded the kernel config too - http://forkbomb.nl/pub/temp/kernel_config (In reply to comment #11) > Anyway, I triggered the bug again. This time I've collected the log from the > right place and using debug_level 0xBE7FFFFE (turned off the bits you told me). > It's pretty big because of the somewhat "liberal" mask, but I hope it can be > useful to you. The last message in the file is just ~10 seconds after the > problem occurred. I forgot to mention to also remove IWL_DL_IO flag :-/ , but it's fine, I can filter out not necessary things using sed (and by the way refresh my sed skill :) Thanks! Unfortunately I can not find any direct bug cause from provided logs. We can see that driver activity is decreased in two times: one at about 34:29 and second time 34:58 . $ for min in 34 35 ; do for sec in `seq -f "%02g" 0 59` ; do echo -ne "$min:$sec\t" ; grep "^Sep 14 15:$min:$sec Firebat" syslog | wc -l ; done ; [snip] 34:14 0 34:15 0 34:16 18084 34:17 33737 34:18 35589 34:19 34845 34:20 34517 34:21 32174 34:22 33491 34:23 34031 34:24 32465 34:25 35098 34:26 34863 34:27 35377 34:28 34739 34:29 33262 34:30 35008 34:31 33031 34:32 35023 34:33 33605 34:34 32579 34:35 34061 34:36 34377 34:37 34373 34:38 33179 34:39 31291 34:40 33543 34:41 16692 34:42 7843 34:43 8152 34:44 7872 34:45 9305 34:46 7599 34:47 8057 34:48 8709 34:49 8004 34:50 8786 34:51 9579 34:52 9119 34:53 7564 34:54 7568 34:55 8155 34:56 9475 34:57 10100 34:58 3066 34:59 333 35:00 811 35:01 1222 35:02 511 35:03 255 35:04 459 35:05 293 35:06 196 35:07 53 35:08 0 35:09 0 [snip] $ grep ratelimit syslog Sep 14 15:34:16 Firebat kernel: __ratelimit: 9715 callbacks suppressed Sep 14 15:34:21 Firebat kernel: __ratelimit: 9712 callbacks suppressed Sep 14 15:34:36 Firebat kernel: __ratelimit: 9303 callbacks suppressed Sep 14 15:34:41 Firebat kernel: __ratelimit: 8848 callbacks suppressed Sep 14 15:34:46 Firebat kernel: __ratelimit: 1186 callbacks suppressed Sep 14 15:34:51 Firebat kernel: __ratelimit: 1341 callbacks suppressed Sep 14 15:34:56 Firebat kernel: __ratelimit: 1246 callbacks suppressed Sep 14 15:35:01 Firebat kernel: __ratelimit: 492 callbacks suppressed Sep 14 15:35:07 Firebat kernel: __ratelimit: 50 callbacks suppressed Number of MISSED_BEACONS_NOTIFICATION is increased after 34:29 but still we have some of them when driver works at high speed. [stasiu@dhcp-lab-109 504316-iwlagn-speed-drops]$ grep MISSED_BEACONS syslog Sep 14 15:34:17 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 75 i 74 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:19 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 133 i 132 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:21 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 136 i 135 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 199 i 198 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:26 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 200 i 199 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:32 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 72 i 71 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:34 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 31 i 30 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:38 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 11 i 10 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:40 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 223 i 222 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 140 i 139 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:44 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 81 i 80 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:46 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 36 i 35 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:48 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 27 i 26 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:50 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 44 i 43 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:52 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 64 i 63 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:54 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 14 i 13 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:56 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 20 i 19 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:34:58 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 188 i 187 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:35:00 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 250 i 249 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:35:02 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 81 i 80 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:35:05 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 131 i 130 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Sep 14 15:35:07 Firebat kernel: ieee80211 phy0: I iwl3945_rx_handle r 174 i 173 No handler needed for MISSED_BEACONS_NOTIFICATION, 0xa2 Georgi, In current linus tree we have some new fixes related with rate scaling, could you give them a chance. You can directly check linus tree git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git or even better newest iwlwifi tree: git://git.kernel.org/pub/scm/linux/kernel/git/rchatre/iwlwifi-2.6.git Could you also try older firmware from: http://intellinuxwireless.org/?n=downloads&f=ucodes_3945 Could you provide information about your network, type of network (a,b,g), used encryption, other AP or stations in radio scope, any other thing you think is relevant. Perhaps I will try to reproduce the bug by my own. I missed that in logs: Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 1, success_counter: 1, expected_tpt is not NULL Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate leave: 11 We have few such messages, all earlier than 15:34:43 when adapter slow down: Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 1, success_counter: 1, expected_tpt is not NULL Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 5, success_counter: 4, expected_tpt is not NULL Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 7, success_counter: 6, expected_tpt is not NULL Sep 14 15:34:23 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 9, success_counter: 7, expected_tpt is not NULL Sep 14 15:34:28 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 1, success_counter: 1, expected_tpt is not NULL Sep 14 15:34:28 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 4, success_counter: 4, expected_tpt is not NULL Sep 14 15:34:28 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 8, success_counter: 7, expected_tpt is not NULL Sep 14 15:34:41 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 1, success_counter: 1, expected_tpt is not NULL Sep 14 15:34:41 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 2, success_counter: 2, expected_tpt is not NULL Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 3, success_counter: 3, expected_tpt is not NULL Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 4, success_counter: 4, expected_tpt is not NULL Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 5, success_counter: 5, expected_tpt is not NULL Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 6, success_counter: 6, expected_tpt is not NULL Sep 14 15:34:42 Firebat kernel: ieee80211 phy0: I rs_get_rate Invalid average_tpt on rate 11: counter: 7, success_counter: 7, expected_tpt is not NULL Hello, It's a busy week for me and I will be able to test the git trees in a few days. :\ The problem appeared in different 802.11g environments - WRT54GL AP with WPA2 and ZyXEL AP with WEP. I will probably try different configurations soon. About the log messages - I spotted them in the beginning, but thought they are not related to the issue. Can you please translate them into human language? :) Do they indicate misconfiguration, hardware problem or something else? > About the log messages - I spotted them in the beginning, but thought they are > not related to the issue. Can you please translate them into human language? :) I don't understand them too, at least not at level I would like to understand. As far as I can tell: - MISSED BEACONS notification means that adapter do not get beacon frame (http://en.wikipedia.org/wiki/Beacon_frame) from AP. As long number of lost frames are small, this should be not any problem. - "Invalid average_tpt on rate" mean driver has not enough data (don't know exactly what data) to make decision about selecting new rate, instead it use previously selected rate. > Do they indicate misconfiguration, hardware problem or something else? I thought they are suspicious but they seems to be pretty normal, hence not indicate nothing wrong. Unfortunately, the problem persists with the latest iwlwifi kernel tree (0e2dc29804cdfd84e2ffc23e8f65d32427129a6a). Additionally, I get some warnings (unaligned IP payload?!) when changing AP and I have to reload the module to use the wireless at all, but they seem irrelevant (they happen when the speed hasn't dropped aswell): Sep 26 11:21:51 Firebat kernel: ------------[ cut here ]------------ Sep 26 11:21:51 Firebat kernel: WARNING: at net/mac80211/rx.c:409 __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211]() Sep 26 11:21:51 Firebat kernel: Hardware name: Satellite A200 Sep 26 11:21:51 Firebat kernel: unaligned IP payload at 0xf5c94036 Sep 26 11:21:51 Firebat kernel: Modules linked in: i915 drm coretemp snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 cpufreq_ondemand speedstep_lib acpi_cpufreq freq_table ppdev parport_pc parport fuse arc4 ecb snd_hda_codec_realtek snd_hda_intel snd_hda_codec iwl3945 snd_hwdep snd_pcm iwlcore snd_timer snd mac80211 sdhci_pci sdhci soundcore mmc_core psmouse processor r8169 tifm_7xx1 intel_agp video thermal mii thermal_sys agpgart hwmon uhci_hcd cfg80211 sg tifm_core button ac i2c_i801 snd_page_alloc output serio_raw led_class battery rfkill evdev Sep 26 11:21:51 Firebat kernel: Pid: 0, comm: swapper Not tainted 2.6.31-wl-git #2 Sep 26 11:21:51 Firebat kernel: Call Trace: Sep 26 11:21:51 Firebat kernel: [<f8493dde>] ? __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211] Sep 26 11:21:51 Firebat kernel: [<f8493dde>] ? __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211] Sep 26 11:21:51 Firebat kernel: [<c102ce26>] ? warn_slowpath_common+0x76/0xd0 Sep 26 11:21:51 Firebat kernel: [<f8493dde>] ? __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211] Sep 26 11:21:51 Firebat kernel: [<c102cecb>] ? warn_slowpath_fmt+0x2b/0x30 Sep 26 11:21:51 Firebat kernel: [<f8493dde>] ? __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211] Sep 26 11:21:51 Firebat kernel: [<f85e94be>] ? iwl3945_irq_tasklet+0x4ae/0x15a0 [iwl3945] Sep 26 11:21:51 Firebat kernel: [<f8482e17>] ? ieee80211_tasklet_handler+0xe7/0x110 [mac80211] Sep 26 11:21:51 Firebat kernel: [<c1035ed5>] ? run_timer_softirq+0x35/0x1c0 Sep 26 11:21:51 Firebat kernel: [<c1031608>] ? tasklet_action+0x48/0xa0 Sep 26 11:21:51 Firebat kernel: [<c1031cdb>] ? __do_softirq+0x8b/0x110 Sep 26 11:21:51 Firebat kernel: [<c105abcd>] ? handle_IRQ_event+0x2d/0xc0 Sep 26 11:21:51 Firebat kernel: [<c105d261>] ? move_native_irq+0x11/0x50 Sep 26 11:21:51 Firebat kernel: [<c1031d8d>] ? do_softirq+0x2d/0x40 Sep 26 11:21:51 Firebat kernel: [<c1004dc0>] ? do_IRQ+0x50/0xc0 Sep 26 11:21:51 Firebat kernel: [<c10034e9>] ? common_interrupt+0x29/0x30 Sep 26 11:21:51 Firebat kernel: [<f828812e>] ? acpi_idle_enter_bm+0x270/0x2a5 [processor] Sep 26 11:21:51 Firebat kernel: [<c130860f>] ? cpuidle_idle_call+0x6f/0xc0 Sep 26 11:21:51 Firebat kernel: [<c1001f04>] ? cpu_idle+0x54/0x70 Sep 26 11:21:51 Firebat kernel: [<c1515865>] ? start_kernel+0x2d1/0x334 Sep 26 11:21:51 Firebat kernel: [<c151535d>] ? unknown_bootoption+0x0/0x1e9 Sep 26 11:21:51 Firebat kernel: ---[ end trace c9245dfa3a0496aa ]--- (In reply to comment #18) > Additionally, I get some warnings > (unaligned IP payload?!) when changing AP and I have to reload the module to > use the wireless at all, but they seem irrelevant (they happen when the speed > hasn't dropped aswell): > > Sep 26 11:21:51 Firebat kernel: ------------[ cut here ]------------ > Sep 26 11:21:51 Firebat kernel: WARNING: at net/mac80211/rx.c:409 > __ieee80211_rx_handle_packet+0x5fe/0x6c0 [mac80211]() > Sep 26 11:21:51 Firebat kernel: Hardware name: Satellite A200 > Sep 26 11:21:51 Firebat kernel: unaligned IP payload at 0xf5c94036 There is no need to set CONFIG_MAC80211_DEBUG_PACKET_ALIGNMENT since you are running on platform that can handle unaligned payloads. Yes, I've already found that (through the kernel bugzilla). Seems that I have accidentally switched on this option and this problem is unrelated to the rate scaling issues. Good news is I'm able to reproduce the issue, bad is I can not figure out what is going wrong as far. The way I can reproduce is: downloading big file with speed 20Mbit/s, then go away from AP - signal strength become very week and speed decrease to something like 1.2Mbit/s . Then come back near AP, signal strength become high, but speed not. After dissociate and associate again with AP effective download rate come back to 20Mbit/s. Note I have to go quite far from AP to make signal very poor, otherwise after coming back speed become high automatically. Created attachment 363428 [details]
I'm using this patch debug frames traffic
Log with applied patch when dowlad file and walking with laptop, looks like this: *) At the begining when signal is high: Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 3 (plcp) 54 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=49, tim=19734808 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=49, tim=19735122 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 3 (plcp) 54 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=49, tim=19735436 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=49, tim=19735678 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 3 (plcp) 54 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=39, tim=19736444 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=50, tim=19736758 usec, phy=0x11, chnl=1 rate 54 Oct 2 08:53:29 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=49, tim=19737072 usec, phy=0x11, chnl=1 rate 54 *) Then when signal become low: Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 110 (plcp) 11 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=19, tim=4419464 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=19, tim=4426260 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 110 (plcp) 11 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=19, tim=4447594 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=19, tim=4456286 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 110 (plcp) 11 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=19, tim=4463010 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=19, tim=4480436 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 110 (plcp) 11 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=18, tim=4487205 usec, phy=0x13, chnl=1 rate 2 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Beacon: 0x0080, dst=0xff, src=0x3e, rssi=19, tim=4502989 usec, phy=0x13, chnl=1 rate 1 Oct 2 08:53:56 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=19, tim=4515701 usec, phy=0x13, chnl=1 rate 2 *) Then when signal again become high: Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Tx sta id: 0, rate: 3 (plcp) 54 (ieee), flags: 0x 8 cck/ofdm mask: 0xf/0xff Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35143592 usec, phy=0x17, chnl=1 rate 11 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35145399 usec, phy=0x17, chnl=1 rate 11 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35151952 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35158280 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35164805 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35172467 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=67, tim=35174208 usec, phy=0x17, chnl=1 rate 11 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35175859 usec, phy=0x17, chnl=1 rate 11 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=67, tim=35177897 usec, phy=0x17, chnl=1 rate 11 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35184930 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35192088 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35199220 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Retry: 0x4a88, dst=0xcc, src=0x3c, rssi=68, tim=35206280 usec, phy=0x17, chnl=1 rate 2 Oct 2 08:54:27 dhcp-lab-161 kernel: iwl3945 0000:03:00.0: Frame: 0x4288, dst=0xcc, src=0x3c, rssi=68, tim=35208183 usec, phy=0x17, chnl=1 rate 11 As we can se we have high signal RSSI=68 but lots of replays. We are sending data at 54 Mbit/s but AP to us at 2 or 11 Mbit/s. Taking into account that AP send to Station at low rate, may suggest this can be problem with AP. However Georgi wrote that when he use Windows everything was fine. In my case I do test (walk far away from AP) with wifi usb dongle (Ralink chipset) and everything was fine to - after coming back near AP transfer become high again. Reinette, John, Do You have any hints where the bug could be or how to debug this? As far I'm sniffing over radio using wireshark, I'm comparing Ralink and Intel transfers, but can not found anything than can cause this issue. Note: problem is not fixed upstream, I reproduced it on iwlwifi Linus and fedora kernels trees. I'm glad that you have managed to reproduce the issue :) It can't be a problem with the AP - as you have drawn attention to, everything is OK under Windows. Moreover I have tested this with 3 different APs from different vendors. By the way, I can reproduce the problem without moving my notebook. Maybe the interference from other networks on closer channels (the environment is the centre of Sofia) is helping the rate scaling issue. Created attachment 363883 [details]
iwl3945-rs-use-default.patch
Dunno if this will even work, but you might try using Minstrel instead of iwl3945's custom rate control algorithm...?
Created attachment 363954 [details]
iwl3945-use-default-rs.patch
We have to remove explicit calls to iwl3945_rate_scale_init() to use "minstrel" rate scaling algorithm.
However patch does not help. Bug must be somewhere else. Note, some rate scaling related functions (iwl3945_hw_build_tx_cmd_rate, iwl3945_init_hw_rate_table) still are used after applied this patch.
This information may be relevant. My wiress card is Broadcom Corporation BCM4322 802.11a/b/g/n Wireless LAN Controller (rev 01). My Ap is trednet TEW-632BRP. The driver is from here:http://www.broadcom.com/support/802.11/linux_sta.php. Kernel is 2.6.30.8-64.fc11.x86_64. My wireless network becomes slow after a while. Then I can not connect to my wireless AP after 5-6 hour. 802.11 G AP seems ok. At the same time, I observed this in /var/log/messages: Oct 29 16:34:56 f11 kernel: __ratelimit: 5 callbacks suppressed Oct 29 16:34:56 f11 kernel: hpet1: lost 1 rtc interrupts Oct 29 16:35:03 f11 kernel: hpet1: lost 1 rtc interrupts Oct 29 16:35:04 f11 kernel: hpet1: lost 1 rtc interrupts Oct 29 16:35:15 f11 kernel: hpet1: lost 1 rtc interrupts Oct 29 16:35:21 f11 kernel: hpet1: lost 1 rtc interrupts If I disable wireless, I do not have those messages. -jason Jason, that driver is not supported by Fedora. Feel free to report problems to Broadcom, but please don't pollute this bug with reports about that driver. I have the exact same problem. My setup is as follow: - Thinkpad T61 with iwl3945 wifi - The Wifi router is Linksys WRT54GSV4 - Currently using latest 2.6.31 kernel I've seen this problem for a couple years now, and it sometimes go away with kernel/firmware updates, and eventually comes back after a distro upgrade. The symptoms are the same: the link is working great, and then after a heavy transfer, the speed drops to ~100KB/s. I'm realtively far away from the router (two stories UP, with couple concrete walls in between, but I can reproduce the bug getting closer to the wifi router. Can't seem to reproduce when in the same room as wifi router. -- Marc Hi, sorry for delay I was busy working on RHEL. I updated AP firmware (I have WRT61ON) and problem disappears for me. I can still reproduce with other router (WRT160NL with original firmware), but not in the same way as before, speed after getting back good signal increase to a half of previous. Now I think this is AP bug, that is not encountered on windows driver, only happens with linux stations. Georgi, Marc, could you try upgrade firmware in your AP and see if that help? If you are using newest firmware could you try to install OpenWRT or dd-wrt? I installed OpenWRT on WRT160NL and speeds drops gone. I have the most up2date Firmware on my router, and I was still getting the problem. There must have been a recent change in the kernel (or drivers for the iw3945 card) because I have not seen the problem for a while. As I said in my previous post, the problem has come and gone in the past as well... so this might as well come back with another update (kernel or drivers). Thanks. It's seems that the problem is now fixed - I haven't experienced this issue for a few months, using the newest stable kernels. It seems that the problem is now fixed - I haven't experienced this issue for a few months, using the newest stable kernels. |