Description of problem: Laptop is able to connect to WiFi access point but is unstable when signal is weak. One would often blame factors that affect signal strength and say 'tough luck' but using Windows on the laptop provides a stable connection. Ie. there seems to be nothing wrong with laptop or router hardware and signal is plenty strong enough. Using 'mtr' to diagnose problem shows pings are answered for a short period after wireless association but then answers stop. Using Wireshark one can see multiple ARP transmissions asking for 'who has <gateway IP>?'. No response is seen. Interestingly trying to manually adjust ARP table has no real effect. Association with the wireless router is never lost, but packets do not flow - just APR requests. The laptop in question is an IBM X220 with '1x1 11b/g/n' card. This is using the rtl8192ce module. I have tried disabling power management with the module options but this has no effect. Once the problem occurs one must disconnect/reconnect by using either software or hardware wireless switch off then back on again (causing association to re-occur). Version-Release number of selected component (if applicable): 3.5.x (Latest Fedora Kernel as of 9 August 2012). How reproducible: Seems to occur when connecting to most wireless routers when signal strength is low. I know 'low' is a poor definition so for reference, Windows reports signal strength as 32% in current location which has the problem but in which Windows networking is pretty much perfect. Packets (aside from ARP requests) will often stop flowing within a minute or two. Steps to Reproduce: 1. Take a machine equipped with Realtek hardware using the rtl8192ce module. 2. Connect to Wifi with low signal strength. 3. Try to use network. Actual results: Network association occurs but packets stop flowing after a short period of time. As described above, ARP requests are seen to repeat and go unanswered. Expected results: Stable network as can be achieved using the same hardware and conditions under Windows 7 O/S. Additional info: I have marked this as a high priority bug because without a network one is pretty much done for, and this bug means I have to reboot into Windows most of the time :(
Have you tried running with power save off? Use "modprobe -v ips=0" for that.
Sorry, I didn't specify kernel version before but it's 3.5.0-2.fc17.x86_64 But yes, I tried to turn power save off. I have to the following... options rtl8192ce ips=0 swlps=0 fwlps=0 debug=2 In this file... /etc/modprobe.d/rtl8192ce.conf `dmesg | grep rtl` produces this which I think shows the above options are working... [ 4.331460] rtl8192ce: rtl8192ce: Power Save off (module option) [ 4.332358] rtl8192ce: rtl8192ce: FW Power Save off (module option) [ 4.333151] rtl8192ce: Using firmware rtlwifi/rtl8192cfw.bin [ 4.425254] ieee80211 phy0: Selected rate control algorithm 'rtl_rc' [ 4.425463] rtlwifi: wireless switch is on
Unfortunately, your reported signal strength for Windows is of no use in trying to reproduce this issue as I do not have any Windows setup. Please report the results of 'iwlist scan' for this setup, or iwconfig after the connection is established.
Created attachment 604576 [details] Output of 'iwlist scan;iwconfig' when the connection is in a bad state (eg, pings not answered, constant ARP being sent).
Created attachment 604577 [details] Output of 'iwlist scan;iwconfig' when the connection is in a good state (working perfectly).
Thanks for the info. The device works OK with a signal strength of -45 dBm, and fails at -58 dBm. I forgot to ask one other question yesterday. As there are two different chip families that both use rtl8192ce, the output of '/sbin/lspci -nn | grep Realtek' would be useful to see if you have an RTL8188CE, or an RTL8192CE. I'm not sure if it makes a difference.
Yes, but under windows the device works perfectly in both cases (with both strong and weak signals). The lspci output shows: 03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter [10ec:8176] (rev 01)
Is it not perhaps similar to this bug? https://bugzilla.redhat.com/show_bug.cgi?id=770207 although this bug is marked as solved, im still having the same problem on kernel 3.5.1 now, i find my conection slows after a minute or so, i found this bug because i also continulally just get ARP requests in wireshark. using a Netgear RTL8188CE pci-e card.
Not really. Bug 770207 says the speed just slows. What I see is no packets flowing at all :(
I don't know if this helps, but the location I'm in now seems to be right on the cusp of when this problem occurs. That is, Linux is able to connect to the access point and work only for a very short time (literally seconds) before packets begin to be lost whereas Windows will still work fine. Output of 'iwlist scanning' shows quality/level as "Quality=68/70 Signal level=-42 dBm" More investigation... recall I said one can see lots of ARP requests being sent, so I used 'ip neigh list' to have a look at that table directly after WiFi association. The H/W address of the access point here is 00:0B:3B:74:6D:CA. When first connected the list shows... 192.168.1.1 dev wlan0 lladdr 00:02:44:a0:d0:a4 REACHABLE 192.168.1.3 dev wlan0 lladdr 00:0b:3b:74:6d:ca REACHABLE [ Side note: a bit interesting as the default gateway is not the access point itself. Doesn't really matter though, huh? ] Anyhow - so I had a ping going and when this stopped working and the ARP requests started being continually sent I modified the ARP table manually to add in the above so it looked like this... 192.168.1.1 dev wlan0 lladdr 00:02:44:a0:d0:a4 PERMANENT 192.168.1.3 dev wlan0 lladdr 00:0b:3b:74:6d:ca PERMANENT Sure enough, Wireshark showed ARP request no longer broadcast and that the ping packets were being sent. Only no reply came back. Trying to do other stuff like DNS lookup, etc showed packets were going out but again, no reply. I don't know much about physical WiFi layer, but surely an access point should ACK every packet it receives and only then does the sender mark it as truly sent (show up in Wireshark)? Or can it be that this laptop is spewing out packets that it thinks (hopes?) are being heard but maybe the transmit power is too low and they aren't? In which case how come the association with this access point is not dropped? Sorry if that's a noob question but cannot find any obvious details on this.
If you are running Wireshark on the laptop, you are only seeing the traffic between the network stack and the driver. To actually see what is on the air, you need a different computer running in promiscuous mode so that it sees every packet. I do not know why the association has not been dropped. I just reviewed this thread. I see you have a Netgear router. What is the maximum speed allowed by it? When I was connected to a Netgear router running in 802.11g mode (54 Mbps max) with the external antennas on the RTL8188CE disconnected, and the router buried under a computer, I got the following for iwlist: Cell 03 - Address: 00:18:4D:7F:48:7E Channel:11 Frequency:2.462 GHz (Channel 11) Quality=55/70 Signal level=-55 dBm Encryption key:on ESSID:"lwfdjf-g" Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 6 Mb/s 12 Mb/s; 24 Mb/s; 36 Mb/s Bit Rates:9 Mb/s; 18 Mb/s; 48 Mb/s; 54 Mb/s This signal is considerably smaller that your value of -42 dBm where you are reporting failure. Not only was I able to connect, but the connection was stable and reasonably fast: --- 192.168.4.1 ping statistics --- 60 packets transmitted, 60 received, 0% packet loss, time 59087ms rtt min/avg/max/mdev = 1.827/5.134/29.100/5.239 ms At this signal, the rate is set at 18 Mbps, which accounts for the long average rtt. The above was accomplished with kernel 3.6-rc3 from wireless testing. There should not be any significant changes between my kernel and the one you are using; however, implementing compat-wireless for 3.6-rc3 will include them in your system.
Thanks for the clarification on the Wireshark usage. That make sense though. The machine _thinks_ it's sending this stuff but I don't think it can be. Packets getting lost in the driver though? That sounds a bit unlikely, eh? Sadly I don't have another machine to check what's 'in the air' but do have an android device so will try and find a sniffer for that. I'm not sure about specifics of routers as am travelling with a laptop and am never party to that. The first iwlist outputs were at a completely different place to where I am now (where I said signal was -42dBm). I think you are you saying I should try a 3.6 kernel or wait for one to be released for FC17. If you know of an RPM of this I can D/L then will gladly give it a try.
So... I installed a Wireshark-like utility on my Android handset and tested it worked by sniffing ping packets between the laptop and gateway on a stable connection. These were seen no problem. I then ran Wireshark on my laptop and the sniffer on the Android phone when the connection was in the failed state. Indeed, Wireshark on the laptop showed ARP requests going out every second, but these were not picked up on the Android sniffer. This would seem to confirm that although the Linux network stack thinks packets are being sent, they are in fact not being sent.
As I have been unable to duplicate the problem, I have no idea what steps to take next. I am not a Fedora user, but a quick Google search shows that http://people.redhat.com/sgruszka/compat_wireless.html has the method for installing compat-wireless-next, which is what you want. That will get the driver that I am using. If that still fails, please give as much detail as possible regarding the router brand/model and its setup as possible. I realize that this will be difficult if you do not own the router, but try to ask the person that maintains the setip.
I'd like to try this compat-wireless-next thing but sadly following those instructions does not work. kmod-debug-compat-wireless-next-2012_07_03-0.fc17.1.x86_64 package Requires: kernel-uname-r = 3.5.1-1.fc17.x86_64.debug The debug version of the FC17 kernel I have installed is kernel-debug-3.5.2-3.fc17.x86_64 though. Or have I missed something?
Ah, sorry... don't worry about that. I will just install the yum install kmod-compat-wireless-next.x86_64 non-debug version and see how that goes.
Had to go back to a slightly earlier kernel (3.5.1-1.fc17.x86_64) but now have: # lsmod | grep compat compat 13168 8 bnep,cfg80211,btusb,mac80211,rtlwifi,rfcomm,bluetooth,rtl8192ce Sad to say the problem persists though :(
That was not unexpected. I still would like to get enough info about the AP and the setup for me to duplicate the problem here, In the meantime, unload the module with 'sudo /sbin/modprobe -rv rtl8192ce' and reload it with 'sudo /sbin/modprobe -v rtl8192ce debug=1'. When the problem happens, please post that section of the dmesg output. The number after the debug can go up to 5, but that is a lot of info, and I would like to start small.
Created attachment 609308 [details] Output of dmesg with debug=2 as configured in modprobe.d file This is the dmesg output from connecting the WiFi adapter (turning on the radio switch on the X220 laptop). As can been seen, association took a few attempts but then did work. I then ran a ping and moved away to a location where it was not answered (the ARP requests started) but WiFi association was not lost. However, no further output occurred.
Something tells me this 'debug=2' option isn't doing much. I ramped it up to 5 for curiosity's sake and saw no more output. Is the format I mention in reply 2 above correct?
Yes, that is correct. If you get no debug output with "debug=5", then that is messed up in your kernel. Using 5 is like trying to drink from a fire hose. I once sent the output from debug=5 to my Realtek contact, and he requested a rerun with 4 as there was too much output. The 5 output should look like: [ 229.229328] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.229354] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.229363] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.252603] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.252628] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.252638] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.259551] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.259579] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.259589] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.266873] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.266898] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.266907] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.275185] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.275213] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.275222] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.276343] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.276357] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.276365] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.283673] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.283694] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.283703] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 [ 229.285952] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.331745] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt! [ 229.331771] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200) [ 229.331780] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004 As you can see, this is the output for only 0.103 seconds.
Well I'm now on this kernel... Linux x220 3.6.3-1.fc17.x86_64 #1 SMP Mon Oct 22 15:32:35 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux And the problem still persists :( It seems the 'debug=2' I have in modprobe config file is still ignored. Is there anything else I can try to help find this fault. It's incredibly frustrating :(
I'm also having this problem. My lspci -v output 03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN Controller (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. Device e020 Flags: bus master, fast devsel, latency 0, IRQ 17 I/O ports at 3000 [size=256] Memory at f2000000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 88-55-22-fe-ff-4c-e0-00 Kernel driver in use: rtl8192se iwconfig output with working link. XX:XX on ssid/mac adress. wlan0 IEEE 802.11bgn ESSID:"XXXXXXXXXXX" Mode:Managed Frequency:2.437 GHz Access Point: XX:XX:XX:XX:XX:XX Bit Rate=72.2 Mb/s Tx-Power=20 dBm Retry long limit:7 RTS thr=2347 B Fragment thr:off Encryption key:off Power Management:off Link Quality=69/70 Signal level=-41 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:1 Missed beacon:0 iwconfig outout when no longer working, at this point i'm connected to a different access point. So it has successfully roamd atleast once. wlan0 IEEE 802.11bgn ESSID:"XXXXXXXXXXXXX" Mode:Managed Frequency:2.462 GHz Access Point: XX:XX:XX:XX:XX:XX Bit Rate=18 Mb/s Tx-Power=20 dBm Retry long limit:7 RTS thr=2347 B Fragment thr:off Encryption key:off Power Management:off Link Quality=70/70 Signal level=-38 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 I'm also having trouble getting debug output from the rtl8192se module. i load it with modprobe -v rtl8192se debug=2 and i see it putting the debug commandline on the insmod line. But i dont see any debug info in my kern.log / dmesg output.
I reported the problem towards a prior fedora kernel, there is some more information in that bugreport. It is now closed though. But in that bugreport i attached a debug log of the rtl8192se.. https://bugzilla.redhat.com/show_bug.cgi?id=811054
Is this still a problem with 3.8.2 in updates-testing?
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.