Created attachment 590412 [details] Log of ping to wifi gateway Description of problem: So I've spent some time figuring out what is wrong with my VOIP usage, when I'm connected via Wifi. Once per ~minute (not sure about exact timing) I've been getting big drops in VOIP quality. After some more experimenting - plain ping to my router has shown there is something wrong going on (ping attached). Then I've tried other connected machine using different OS (Ubuntu) and having exact same physical position and signal quality and no problems - I'd to drop the idea of having Wifi signal problems. Next experiment was to use Wifi WITHOUT NetworkManager - after some time spent with configuring all those WPA passwords - I've finally got undisturbed pings. So this leaved NM as primary suspect. My next step was to run 'iwlist wlan0 scan' - and results on ping times looked very similar to ping delays as to what default NM is doing on my desktop. Then I've tried to figure out if there is some NM option to disable auto scanning - and haven't found out one?? And so I've opened BZ to have the option to disable Wifi scanning when connected over wifi - since it makes VOIP basically unusable. If there is option to disable autoscanning - it's very well hidden. If there is not - it should be added (seems like Debian is using it??) My hw - T61 C2D, iwl3945, kernel 3.4 x86_64 Version-Release number of selected component (if applicable): NetworkManager-0.9.4-5.git20120521.fc18.x86_64 How reproducible: Steps to Reproduce: 1. ping to gateway router 2. 3. Actual results: Expected results: Additional info:
This problem seems to be 3945 specific.
Debugging of wpa_supplicant: https://live.gnome.org/NetworkManager/Debugging#wpa_supplicant07 revealed that if no BSSID is given the scan is being made in upto 2 minutes interval. As a workaround it's currently possible to setup BSSID to the AP address which disables periodic wlan scans.
Ok - so I've test BSSID solution - and it works for VOIP quite well. However this should be somehow intelligently managed by NM itself. It should possibly detect the hw scan takes a lot of time and delays ping and user should have some option according to his preference - whether scan for surrouding AP or VOIP latency is for him more important - otherwise he will get lags and it's not so trivial to detect real reason for packet delays. I've also tried an iwl3945 option disable_hw_scan=0 - this option does some help - it seems to make the number of delayed packet smaller - instead of 5 it's usually just 2 of them with latency upto 100ms, but it also makes NM unusable after suspend/resume cycle since no new HW scan happens and new AP are not discovered after resume.
It's also probably worth to note here, that even machines like T520 with i7 CPUs and iwlan WiFi drivers are influenced by those scans in default f17 - though the pings are not reaching 200ms - they are in 70ms range - but it still quite noticeable ping delay, thought it's probably not such a big issue for VOIP. So NM should really give user an option to enable/disable wifi scanning, and also and option for scan on request. NM cannot pretend it's smarter than user, because it simple cannot know all the conditions. So I guess this BZ may turn into RFE for adding 2 new NM applet options - Disable autoscan WiFi scan now And as mentioned in comment 3 - it would be nice, if the tool would be able to detect latencies like 200ms upon periodic scan and give user a warning explaining him these latencies may have serious impact on low latency apps like VOIP, so user may choose what does he prefer.
I discussed disabling autoscan from NM with Dan some time ago. The point was that we can not do this if we want to have wireless roaming working smoothly. We have NM BSSID: option, which disable autoscan and connect only with one AP. "Trigger scan now" button is not that Dan would like to see from user experience perspective, he prefer to have things just works. To make scan faster NM use now wpa_supplicant "background scan" feature, but I'm not quite sure if it is really faster option, or if it is, maybe not for all drivers. Anyway the thing to do here is make scan work fast and do not influence (much) latency and throughput. I.e for 3945 the goal would be make sw_scan as fast as hw_scan.
(In reply to comment #5) > I discussed disabling autoscan from NM with Dan some time ago. The point was > that we can not do this if we want to have wireless roaming working > smoothly. We have NM BSSID: option, which disable autoscan and connect only > with one AP. Well I'm just wondering how the average user 'Joe' will be able to setup BSSID properly - if there would be some proper support from nm-applet and also user would KNOW what he should do - then I do not see a big problem. But in all other case simple button 'Disable scan' is fairly simple to understand and easy to implement - I do not see any way how to make it better with current state of drivers - it's pretty 'long run task' for kernel drivers to make the scan 'unnoticeable' - meanwhile all users will suffer ??? That's IMHO not fair for user who are not able to setup things - and they are even not aware where is the problem... > > Anyway the thing to do here is make scan work fast and do not influence > (much) latency and throughput. I.e for 3945 the goal would be make sw_scan > as fast as hw_scan. Well even hw_scan makes 100ms delays with 3945 - so not an option at all... (And doesn't work after resume)
As a workaround try: $ sudo wpa_cli -iwlan0 set_network <ESSID HERE> bgscan \"simple:4000:-200:8000\" You will probably find out NetworkManager resets those values on multi-AP networks, but maybe on single AP ones it doesn't.
Unsure what has changed were but now with kernel 3.7-rc8 and NetworkManager-0.9.7.0-9.git20121004.fc19.x86_64 it's actually impossible to disable scan at all. It doesn't matter if I set BSSID or not I'm getting now this - and it's completely unsuable for VOIP over wifi with iwl3945! 168.192.1) 56(84) bytes of data. 64 bytes from 192.168.192.1: icmp_seq=1 ttl=64 time=0.941 ms 64 bytes from 192.168.192.1: icmp_seq=2 ttl=64 time=1.39 ms 64 bytes from 192.168.192.1: icmp_seq=3 ttl=64 time=0.963 ms 64 bytes from 192.168.192.1: icmp_seq=4 ttl=64 time=0.973 ms 64 bytes from 192.168.192.1: icmp_seq=5 ttl=64 time=20.5 ms 64 bytes from 192.168.192.1: icmp_seq=6 ttl=64 time=1.98 ms 64 bytes from 192.168.192.1: icmp_seq=7 ttl=64 time=0.927 ms 64 bytes from 192.168.192.1: icmp_seq=8 ttl=64 time=0.955 ms 64 bytes from 192.168.192.1: icmp_seq=9 ttl=64 time=0.981 ms 64 bytes from 192.168.192.1: icmp_seq=10 ttl=64 time=0.858 ms 64 bytes from 192.168.192.1: icmp_seq=11 ttl=64 time=1.01 ms 64 bytes from 192.168.192.1: icmp_seq=12 ttl=64 time=0.927 ms 64 bytes from 192.168.192.1: icmp_seq=13 ttl=64 time=0.851 ms 64 bytes from 192.168.192.1: icmp_seq=14 ttl=64 time=1.45 ms 64 bytes from 192.168.192.1: icmp_seq=15 ttl=64 time=1.51 ms 64 bytes from 192.168.192.1: icmp_seq=16 ttl=64 time=2.20 ms 64 bytes from 192.168.192.1: icmp_seq=17 ttl=64 time=0.846 ms 64 bytes from 192.168.192.1: icmp_seq=18 ttl=64 time=1.79 ms 64 bytes from 192.168.192.1: icmp_seq=19 ttl=64 time=0.997 ms 64 bytes from 192.168.192.1: icmp_seq=20 ttl=64 time=0.988 ms 64 bytes from 192.168.192.1: icmp_seq=21 ttl=64 time=1.82 ms 64 bytes from 192.168.192.1: icmp_seq=22 ttl=64 time=206 ms 64 bytes from 192.168.192.1: icmp_seq=23 ttl=64 time=754 ms 64 bytes from 192.168.192.1: icmp_seq=24 ttl=64 time=1295 ms 64 bytes from 192.168.192.1: icmp_seq=25 ttl=64 time=1416 ms 64 bytes from 192.168.192.1: icmp_seq=26 ttl=64 time=437 ms 64 bytes from 192.168.192.1: icmp_seq=27 ttl=64 time=710 ms 64 bytes from 192.168.192.1: icmp_seq=28 ttl=64 time=586 ms 64 bytes from 192.168.192.1: icmp_seq=29 ttl=64 time=0.976 ms 64 bytes from 192.168.192.1: icmp_seq=30 ttl=64 time=0.935 ms 64 bytes from 192.168.192.1: icmp_seq=31 ttl=64 time=0.982 ms 64 bytes from 192.168.192.1: icmp_seq=32 ttl=64 time=0.888 ms 64 bytes from 192.168.192.1: icmp_seq=33 ttl=64 time=1.06 ms 64 bytes from 192.168.192.1: icmp_seq=34 ttl=64 time=0.848 ms 64 bytes from 192.168.192.1: icmp_seq=35 ttl=64 time=0.749 ms 64 bytes from 192.168.192.1: icmp_seq=36 ttl=64 time=0.844 ms 64 bytes from 192.168.192.1: icmp_seq=37 ttl=64 time=3.39 ms 64 bytes from 192.168.192.1: icmp_seq=38 ttl=64 time=0.961 ms 64 bytes from 192.168.192.1: icmp_seq=39 ttl=64 time=0.960 ms 64 bytes from 192.168.192.1: icmp_seq=40 ttl=64 time=3.57 ms 64 bytes from 192.168.192.1: icmp_seq=41 ttl=64 time=0.949 ms 64 bytes from 192.168.192.1: icmp_seq=42 ttl=64 time=0.976 ms 64 bytes from 192.168.192.1: icmp_seq=43 ttl=64 time=0.843 ms 64 bytes from 192.168.192.1: icmp_seq=44 ttl=64 time=0.969 ms 64 bytes from 192.168.192.1: icmp_seq=45 ttl=64 time=1.23 ms 64 bytes from 192.168.192.1: icmp_seq=46 ttl=64 time=0.873 ms 64 bytes from 192.168.192.1: icmp_seq=47 ttl=64 time=0.964 ms 64 bytes from 192.168.192.1: icmp_seq=48 ttl=64 time=140 ms 64 bytes from 192.168.192.1: icmp_seq=49 ttl=64 time=499 ms 64 bytes from 192.168.192.1: icmp_seq=50 ttl=64 time=349 ms 64 bytes from 192.168.192.1: icmp_seq=51 ttl=64 time=438 ms 64 bytes from 192.168.192.1: icmp_seq=52 ttl=64 time=388 ms 64 bytes from 192.168.192.1: icmp_seq=53 ttl=64 time=408 ms 64 bytes from 192.168.192.1: icmp_seq=54 ttl=64 time=384 ms 64 bytes from 192.168.192.1: icmp_seq=55 ttl=64 time=738 ms 64 bytes from 192.168.192.1: icmp_seq=56 ttl=64 time=503 ms 64 bytes from 192.168.192.1: icmp_seq=57 ttl=64 time=174 ms
Looks like BSSID does not any longer prevent NM from requesting periodical scans - IOW seems that is NM regression. Zdenek, try to downgrade NetworkManager. Check if scanning request are triggered can be tracked by "iwevent" tool . When scan finish there is "Scan request completed" message reported by iwevent. If this is NetworkManager regression, please open a new bug for that. In this bugzilla, the goal is to improve mac80211 scans to do cause network traffic latency, or more realistically to make latency as small as possible.
> 64 bytes from 192.168.192.1: icmp_seq=47 ttl=64 time=0.964 ms > 64 bytes from 192.168.192.1: icmp_seq=48 ttl=64 time=140 ms > 64 bytes from 192.168.192.1: icmp_seq=49 ttl=64 time=499 ms > 64 bytes from 192.168.192.1: icmp_seq=50 ttl=64 time=349 ms > 64 bytes from 192.168.192.1: icmp_seq=51 ttl=64 time=438 ms > 64 bytes from 192.168.192.1: icmp_seq=52 ttl=64 time=388 ms > 64 bytes from 192.168.192.1: icmp_seq=53 ttl=64 time=408 ms > 64 bytes from 192.168.192.1: icmp_seq=54 ttl=64 time=384 ms > 64 bytes from 192.168.192.1: icmp_seq=55 ttl=64 time=738 ms > 64 bytes from 192.168.192.1: icmp_seq=56 ttl=64 time=503 ms > 64 bytes from 192.168.192.1: icmp_seq=57 ttl=64 time=174 ms This actually looks more like iwl3945 or mac80211 bug ...
Created attachment 664880 [details] scan_latency_optimalizations_draft.patch This draft patch include various iwl3945 and mac80211 changes, which optimize network traffic latency while scanning. Tested with: while true; do iw dev wlan0 scan; sleep 3; done > /dev/null & ping -i0.2 -c 1000 192.168.1.1 Unpatched kernel: --- 192.168.1.1 ping statistics --- 1000 packets transmitted, 991 received, 0% packet loss, time 200988ms rtt min/avg/max/mdev = 0.618/42.983/241.669/62.745 ms, pipe 2 Patched kernel: --- 192.168.1.1 ping statistics --- 1000 packets transmitted, 998 received, 0% packet loss, time 200462ms rtt min/avg/max/mdev = 0.637/3.764/69.229/8.517 ms Patch has to be reworked (spit into individual changes) and posted upstream. Some of those changes could not be accepted, because they could make scan itself too slow or not detecting all APs. We will see ...
(In reply to comment #11) > Patched kernel: > --- 192.168.1.1 ping statistics --- > 1000 packets transmitted, 998 received, 0% packet loss, time 200462ms > rtt min/avg/max/mdev = 0.637/3.764/69.229/8.517 ms The final result will be worse than that, on my tests I have: rtt min/avg/max/mdev = 0.928/19.678/150.845/33.130 ms This is because on draft patch I removed scanning of 5GHz band, which consist mostly of PASSIVE channels (scan time 125 ms, btw: ACTIVE channel scan time is 60 ms). On a bit different test, ping from remote machine to periodically scanning machine, final patches show nice improvement: Unpatched: rtt min/avg/max/mdev = 1.637/120.683/709.139/164.337 ms Patched: rtt min/avg/max/mdev = 1.807/26.893/201.435/40.284 ms
Josh, please apply those two -next patches as fix for the issue (they are queued to 3.9): http://git.kernel.org/?p=linux/kernel/git/jberg/mac80211-next.git;a=commit;h=3f892b61a8cf6a85bc389c8d13209d686a4f6403 http://git.kernel.org/?p=linux/kernel/git/linville/wireless-next.git;a=commitdiff;h=70277f47b58b174a6b0b891dcd06ae5125afb73b This also need below patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=aacde9ee45225f7e0b90960f479aef83c66bfdc0 but this one is a bug fix and is already committed to Linus tree.
I've applied them to rawhide. Did you want them in F17/F18? If so, aacde9ee45225f7e0b90960f479aef83c66bfdc0 is already queued for 3.7.6, so that is easy but 3f892b61a8cf6a85bc389c8d13209d686a4f6403 gets a reject in hunk #4 and probably needs a rework.
This is improvement, but not a critical bug fix, so let do not apply them to F17/F18. Closing with RAWHIDE resolution.