Bug 847875 - rtl8192ce continually sending ARP requests and nothing else with weak router signal
rtl8192ce continually sending ARP requests and nothing else with weak router ...
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-13 17:38 EDT by Robin Rainton
Modified: 2015-08-12 09:13 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-28 09:56:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of 'iwlist scan;iwconfig' when the connection is in a bad state (eg, pings not answered, constant ARP being sent). (2.49 KB, text/plain)
2012-08-15 07:29 EDT, Robin Rainton
no flags Details
Output of 'iwlist scan;iwconfig' when the connection is in a good state (working perfectly). (2.50 KB, text/plain)
2012-08-15 07:30 EDT, Robin Rainton
no flags Details
Output of dmesg with debug=2 as configured in modprobe.d file (12.36 KB, text/x-log)
2012-09-03 05:43 EDT, Robin Rainton
no flags Details

  None (edit)
Description Robin Rainton 2012-08-13 17:38:24 EDT
Description of problem:

Laptop is able to connect to WiFi access point but is unstable when signal is weak.

One would often blame factors that affect signal strength and say 'tough luck' but using Windows on the laptop provides a stable connection. Ie. there seems to be nothing wrong with laptop or router hardware and signal is plenty strong enough.

Using 'mtr' to diagnose problem shows pings are answered for a short period after wireless association but then answers stop.

Using Wireshark one can see multiple ARP transmissions asking for 'who has <gateway IP>?'. No response is seen. Interestingly trying to manually adjust ARP table has no real effect.

Association with the wireless router is never lost, but packets do not flow - just APR requests.

The laptop in question is an IBM X220 with '1x1 11b/g/n' card. This is using the rtl8192ce module. I have tried disabling power management with the module options but this has no effect.

Once the problem occurs one must disconnect/reconnect by using either software or hardware wireless switch off then back on again (causing association to re-occur).

Version-Release number of selected component (if applicable):

3.5.x (Latest Fedora Kernel as of 9 August 2012).

How reproducible:

Seems to occur when connecting to most wireless routers when signal strength is low. I know 'low' is a poor definition so for reference, Windows reports signal strength as 32% in current location which has the problem but in which Windows networking is pretty much perfect.

Packets (aside from ARP requests) will often stop flowing within a minute or two.

Steps to Reproduce:
1. Take a machine equipped with Realtek hardware using the rtl8192ce module. 
2. Connect to Wifi with low signal strength.
3. Try to use network.
  
Actual results:

Network association occurs but packets stop flowing after a short period of time. As described above, ARP requests are seen to repeat and go unanswered.

Expected results:

Stable network as can be achieved using the same hardware and conditions under Windows 7 O/S.

Additional info:

I have marked this as a high priority bug because without a network one is pretty much done for, and this bug means I have to reboot into Windows most of the time :(
Comment 1 Larry Finger 2012-08-13 18:59:01 EDT
Have you tried running with power save off? Use "modprobe -v ips=0" for that.
Comment 2 Robin Rainton 2012-08-14 03:18:33 EDT
Sorry, I didn't specify kernel version before but it's 3.5.0-2.fc17.x86_64

But yes, I tried to turn power save off. I have to the following...

options rtl8192ce ips=0 swlps=0 fwlps=0 debug=2

In this file...

/etc/modprobe.d/rtl8192ce.conf

`dmesg | grep rtl` produces this which I think shows the above options are working...

[    4.331460] rtl8192ce: rtl8192ce: Power Save off (module option)
[    4.332358] rtl8192ce: rtl8192ce: FW Power Save off (module option)
[    4.333151] rtl8192ce: Using firmware rtlwifi/rtl8192cfw.bin
[    4.425254] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
[    4.425463] rtlwifi: wireless switch is on
Comment 3 Larry Finger 2012-08-14 11:00:03 EDT
Unfortunately, your reported signal strength for Windows is of no use in trying to reproduce this issue as I do not have any Windows setup.

Please report the results of 'iwlist scan' for this setup, or iwconfig after the connection is established.
Comment 4 Robin Rainton 2012-08-15 07:29:37 EDT
Created attachment 604576 [details]
Output of 'iwlist scan;iwconfig' when the connection is in a bad state (eg, pings not answered, constant ARP being sent).
Comment 5 Robin Rainton 2012-08-15 07:30:12 EDT
Created attachment 604577 [details]
Output of 'iwlist scan;iwconfig' when the connection is in a good state (working perfectly).
Comment 6 Larry Finger 2012-08-15 08:40:15 EDT
Thanks for the info. The device works OK with a signal strength of -45 dBm, and fails at -58 dBm.

I forgot to ask one other question yesterday. As there are two different chip families that both use rtl8192ce, the output of '/sbin/lspci -nn | grep Realtek' would be useful to see if you have an RTL8188CE, or an RTL8192CE. I'm not sure if it makes a difference.
Comment 7 Robin Rainton 2012-08-15 11:48:07 EDT
Yes, but under windows the device works perfectly in both cases (with both strong and weak signals).

The lspci output shows:

03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter [10ec:8176] (rev 01)
Comment 8 Daniel Williams 2012-08-20 08:02:32 EDT
Is it not perhaps similar to this bug? 

https://bugzilla.redhat.com/show_bug.cgi?id=770207

although this bug is marked as solved, im still having the same problem on kernel 3.5.1 now, i find my conection slows after a minute or so, i found this bug because i also continulally just get ARP requests in wireshark. using a Netgear RTL8188CE pci-e card.
Comment 9 Robin Rainton 2012-08-21 13:35:38 EDT
Not really. Bug 770207 says the speed just slows. What I see is no packets flowing at all :(
Comment 10 Robin Rainton 2012-08-30 14:27:22 EDT
I don't know if this helps, but the location I'm in now seems to be right on the cusp of when this problem occurs. That is, Linux is able to connect to the access point and work only for a very short time (literally seconds) before packets begin to be lost whereas Windows will still work fine.

Output of 'iwlist scanning' shows quality/level as "Quality=68/70  Signal level=-42 dBm"

More investigation... recall I said one can see lots of ARP requests being sent, so I used 'ip neigh list' to have a look at that table directly after WiFi association. The H/W address of the access point here is 00:0B:3B:74:6D:CA. When first connected the list shows...

192.168.1.1 dev wlan0 lladdr 00:02:44:a0:d0:a4 REACHABLE
192.168.1.3 dev wlan0 lladdr 00:0b:3b:74:6d:ca REACHABLE

[ Side note: a bit interesting as the default gateway is not the access point itself. Doesn't really matter though, huh? ]

Anyhow - so I had a ping going and when this stopped working and the ARP requests started being continually sent I modified the ARP table manually to add in the above so it looked like this...

192.168.1.1 dev wlan0 lladdr 00:02:44:a0:d0:a4 PERMANENT
192.168.1.3 dev wlan0 lladdr 00:0b:3b:74:6d:ca PERMANENT

Sure enough, Wireshark showed ARP request no longer broadcast and that the ping packets were being sent. Only no reply came back. Trying to do other stuff like DNS lookup, etc showed packets were going out but again, no reply.

I don't know much about physical WiFi layer, but surely an access point should ACK every packet it receives and only then does the sender mark it as truly sent (show up in Wireshark)? Or can it be that this laptop is spewing out packets that it thinks (hopes?) are being heard but maybe the transmit power is too low and they aren't? In which case how come the association with this access point is not dropped? Sorry if that's a noob question but cannot find any obvious details on this.
Comment 11 Larry Finger 2012-08-30 15:55:19 EDT
If you are running Wireshark on the laptop, you are only seeing the traffic between the network stack and the driver. To actually see what is on the air, you need a different computer running in promiscuous mode so that it sees every packet.

I do not know why the association has not been dropped.

I just reviewed this thread. I see you have a Netgear router. What is the maximum speed allowed by it?

When I was connected to a Netgear router running in 802.11g mode (54 Mbps max) with the external antennas on the RTL8188CE disconnected, and the router buried under a computer, I got the following for iwlist:

          Cell 03 - Address: 00:18:4D:7F:48:7E
                    Channel:11
                    Frequency:2.462 GHz (Channel 11)
                    Quality=55/70  Signal level=-55 dBm  
                    Encryption key:on
                    ESSID:"lwfdjf-g"
                    Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 6 Mb/s
                              12 Mb/s; 24 Mb/s; 36 Mb/s
                    Bit Rates:9 Mb/s; 18 Mb/s; 48 Mb/s; 54 Mb/s

This signal is considerably smaller that your value of -42 dBm where you are reporting failure. Not only was I able to connect, but the connection was stable and reasonably fast:

--- 192.168.4.1 ping statistics ---
60 packets transmitted, 60 received, 0% packet loss, time 59087ms
rtt min/avg/max/mdev = 1.827/5.134/29.100/5.239 ms

At this signal, the rate is set at 18 Mbps, which accounts for the long average rtt.

The above was accomplished with kernel 3.6-rc3 from wireless testing. There should not be any significant changes between my kernel and the one you are using; however, implementing compat-wireless for 3.6-rc3 will include them in your system.
Comment 12 Robin Rainton 2012-08-31 04:30:55 EDT
Thanks for the clarification on the Wireshark usage. That make sense though. The machine _thinks_ it's sending this stuff but I don't think it can be. Packets getting lost in the driver though? That sounds a bit unlikely, eh?

Sadly I don't have another machine to check what's 'in the air' but do have an android device so will try and find a sniffer for that.

I'm not sure about specifics of routers as am travelling with a laptop and am never party to that. The first iwlist outputs were at a completely different place to where I am now (where I said signal was -42dBm).

I think you are you saying I should try a 3.6 kernel or wait for one to be released for FC17. If you know of an RPM of this I can D/L then will gladly give it a try.
Comment 13 Robin Rainton 2012-08-31 10:47:55 EDT
So... I installed a Wireshark-like utility on my Android handset and tested it worked by sniffing ping packets between the laptop and gateway on a stable connection. These were seen no problem.

I then ran Wireshark on my laptop and the sniffer on the Android phone when the connection was in the failed state. Indeed, Wireshark on the laptop showed ARP requests going out every second, but these were not picked up on the Android sniffer.

This would seem to confirm that although the Linux network stack thinks packets are being sent, they are in fact not being sent.
Comment 14 Larry Finger 2012-08-31 11:45:09 EDT
As I have been unable to duplicate the problem, I have no idea what steps to take next.

I am not a Fedora user, but a quick Google search shows that  http://people.redhat.com/sgruszka/compat_wireless.html has the method for installing compat-wireless-next, which is what you want. That will get the driver that I am using.

If that still fails, please give as much detail as possible regarding the router brand/model and its setup as possible. I realize that this will be difficult if you do not own the router, but try to ask the person that maintains the setip.
Comment 15 Robin Rainton 2012-08-31 12:34:21 EDT
I'd like to try this compat-wireless-next thing but sadly following those instructions does not work.

kmod-debug-compat-wireless-next-2012_07_03-0.fc17.1.x86_64 package Requires: kernel-uname-r = 3.5.1-1.fc17.x86_64.debug

The debug version of the FC17 kernel I have installed is kernel-debug-3.5.2-3.fc17.x86_64 though. Or have I missed something?
Comment 16 Robin Rainton 2012-08-31 12:45:28 EDT
Ah, sorry... don't worry about that. I will just install the yum install kmod-compat-wireless-next.x86_64 non-debug version and see how that goes.
Comment 17 Robin Rainton 2012-08-31 14:40:11 EDT
Had to go back to a slightly earlier kernel (3.5.1-1.fc17.x86_64) but now have:

# lsmod | grep compat
compat                 13168  8 bnep,cfg80211,btusb,mac80211,rtlwifi,rfcomm,bluetooth,rtl8192ce

Sad to say the problem persists though :(
Comment 18 Larry Finger 2012-08-31 16:00:51 EDT
That was not unexpected. I still would like to get enough info about the AP and the setup for me to duplicate the problem here,

In the meantime, unload the module with 'sudo /sbin/modprobe -rv rtl8192ce' and reload it with 'sudo /sbin/modprobe -v rtl8192ce debug=1'. When the problem happens, please post that section of the dmesg output. The number after the debug can go up to 5, but that is a lot of info, and I would like to start small.
Comment 19 Robin Rainton 2012-09-03 05:43:04 EDT
Created attachment 609308 [details]
Output of dmesg with debug=2 as configured in modprobe.d file

This is the dmesg output from connecting the WiFi adapter (turning on the radio switch on the X220 laptop).

As can been seen, association took a few attempts but then did work.

I then ran a ping and moved away to a location where it was not answered (the ARP requests started) but WiFi association was not lost. However, no further output occurred.
Comment 20 Robin Rainton 2012-09-03 12:31:13 EDT
Something tells me this 'debug=2' option isn't doing much. I ramped it up to 5 for curiosity's sake and saw no more output. Is the format I mention in reply 2 above correct?
Comment 21 Larry Finger 2012-09-03 14:08:41 EDT
Yes, that is correct. If you get no debug output with "debug=5", then that is messed up in your kernel. Using 5 is like trying to drink from a fire hose. I once sent the output from debug=5 to my Realtek contact, and he requested a rerun with 4 as there was too much output.

The 5 output should look like:

[  229.229328] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.229354] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.229363] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.252603] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.252628] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.252638] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.259551] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.259579] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.259589] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.266873] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.266898] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.266907] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.275185] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.275213] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.275222] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.276343] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.276357] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.276365] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.283673] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.283694] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.283703] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004
[  229.285952] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.331745] rtlwifi:_rtl_pci_interrupt():<10000-1> Rx ok interrupt!
[  229.331771] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> regaddr(0x824), bitmask(0x200)
[  229.331780] rtl8192c_common:rtl92c_phy_query_bb_reg():<10000-1> BBR MASK=0x200 Addr[0x824]=0x80390004

As you can see, this is the output for only 0.103 seconds.
Comment 22 Robin Rainton 2012-11-01 08:51:26 EDT
Well I'm now on this kernel...

Linux x220 3.6.3-1.fc17.x86_64 #1 SMP Mon Oct 22 15:32:35 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

And the problem still persists :(

It seems the 'debug=2' I have in modprobe config file is still ignored.

Is there anything else I can try to help find this fault. It's incredibly frustrating :(
Comment 23 johnny.westerlund 2012-11-29 02:40:28 EST
I'm also having this problem.

My lspci -v output
03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN Controller (rev 10)
	Subsystem: Realtek Semiconductor Co., Ltd. Device e020
	Flags: bus master, fast devsel, latency 0, IRQ 17
	I/O ports at 3000 [size=256]
	Memory at f2000000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 88-55-22-fe-ff-4c-e0-00
	Kernel driver in use: rtl8192se

iwconfig output with working link. XX:XX on ssid/mac adress.

wlan0     IEEE 802.11bgn  ESSID:"XXXXXXXXXXX"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: XX:XX:XX:XX:XX:XX   
          Bit Rate=72.2 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr=2347 B   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=69/70  Signal level=-41 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:1   Missed beacon:0


iwconfig outout when no longer working, at this point i'm connected to a different access point. So it has successfully roamd atleast once.

wlan0     IEEE 802.11bgn  ESSID:"XXXXXXXXXXXXX"  
          Mode:Managed  Frequency:2.462 GHz  Access Point: XX:XX:XX:XX:XX:XX   
          Bit Rate=18 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr=2347 B   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=70/70  Signal level=-38 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

I'm also having trouble getting debug output from the rtl8192se module.
i load it with modprobe -v rtl8192se debug=2 and i see it putting the debug commandline on the insmod line. But i dont see any debug info in my kern.log / dmesg output.
Comment 24 johnny.westerlund 2012-11-29 06:18:00 EST
I reported the problem towards a prior fedora kernel, there is some more information in that bugreport. It is now closed though.

But in that bugreport i attached a debug log of the rtl8192se..

https://bugzilla.redhat.com/show_bug.cgi?id=811054
Comment 25 Josh Boyer 2013-03-14 15:27:09 EDT
Is this still a problem with 3.8.2 in updates-testing?
Comment 26 Josh Boyer 2013-03-28 09:56:08 EDT
This bug is being closed with INSUFFICIENT_DATA as there has not been a
response in 2 weeks.  If you are still experiencing this issue,
please reopen and attach the relevant data from the latest kernel you are
running and any data that might have been requested previously.

Note You need to log in before you can comment on or make changes to this bug.