Bug 1252893 - rtl8192ce is unstable when used with 802.11n router configured with 20Mhz/40Mhz channel width
Summary: rtl8192ce is unstable when used with 802.11n router configured with 20Mhz/40M...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 23
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-12 13:09 UTC by Robin Rainton
Modified: 2023-09-14 03:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-26 16:58:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
tcpdump from Openwrt (5.35 KB, application/octet-stream)
2016-02-26 19:24 UTC, Robin Rainton
no flags Details
Wireshark capture from laptop (5.84 KB, application/octet-stream)
2016-02-26 19:28 UTC, Robin Rainton
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenWRT 21939 0 None None None 2016-02-28 21:34:04 UTC
Red Hat Bugzilla 847875 0 unspecified CLOSED rtl8192ce continually sending ARP requests and nothing else with weak router signal 2021-02-22 00:41:40 UTC

Internal Links: 847875

Description Robin Rainton 2015-08-12 13:09:41 UTC
Description of problem:

After establishing WiFi connection to access point connectivity is lost. Note I do not mean connection to access point is lost, that remains active. Just no packets flow.

Use of Wireshark shows ARP packets being sent (or are they?) but no replies to these.

Version-Release number of selected component (if applicable):

Problem occurs on kernel 4.1.3-201.fc22.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Connect to WiFi network.
2. Wait a few minutes. Monitor ping to local gateway (access point will do).

Actual results:

Packets stop flowing and Wireshark shows ARP requests being sent but not acknowledged.

Expected results:

Connectivity to the next hop on a local LAN should never be lost. I am in the same room as the router so signal strength or interference is not the cause.


Additional info:

This is very similar in symptom to #847875. However, in that case weak signal triggered the fault. In this case signal strength is 100%. Moreover, my network setup was working flawlessly for months before upgrading the kernel.

I'm using a laptop, an IBM X220 with '1x1 11b/g/n' card. This is using the rtl8192ce module. 

ARP issues seen on Wireshark are:

- The local router asking, "Who has IP address of <laptop IP>". Wireshark shows the laptop replies but clearly that isn't getting to the router as it continues to ask. One can set a static ARP on my router (it's Routerboard) and this issue goes away.

- The laptop asking "Who has IP address of <router IP>". Wireshark shows this packet but no reply so the router doesn't seem to be getting it.

This seems as if ARP packets, although logged by Wireshark, are not being sent for some reason.

Comment 1 Robin Rainton 2015-08-29 09:44:32 UTC
I cannot be sure, but think this problem may be resolved with...

Linux x220 4.1.5-200.fc22.x86_64 #1 SMP Mon Aug 10 23:38:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Robin Rainton 2015-08-29 19:48:00 UTC
(In reply to Robin Rainton from comment #1)
> I cannot be sure, but think this problem may be resolved with...
> 
> Linux x220 4.1.5-200.fc22.x86_64 #1 SMP Mon Aug 10 23:38:23 UTC 2015 x86_64
> x86_64 x86_64 GNU/Linux

Nope, sorry... still persists with that kernel too.

Comment 3 Justin M. Forbes 2015-10-20 19:42:40 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 4 Robin Rainton 2015-11-02 17:37:55 UTC
I can confirm this problem persists with:

Linux x220 4.2.3-200.fc22.x86_64 #1 SMP Thu Oct 8 03:23:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 5 Robin Rainton 2016-01-09 17:46:33 UTC
I've recently update to FC23 but this problem persists with this kernel:

Linux x220 4.2.8-300.fc23.x86_64 #1 SMP Tue Dec 15 16:49:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I honestly cannot understand how there have been no other reports of this. Is there anything I can do to debug the network stack and try and find the fault.

I know there is nothing wrong with the hardware of this laptop or the network as when running Windows on this laptop there is no issue. Several Android devices use this same WiFi network without issue.

Comment 6 Robin Rainton 2016-01-09 18:43:34 UTC
Here's a test I tried:

- In one terminal execute TCP dump looking for ARP packets (tcpdump -ennqti wlan0 arp)
- In another terminal ping the WIFI router the laptop is connected to.

Output of TCP dump has many of these pairs

08:57:00:4c:ef:ee > ff:ff:ff:ff:ff:ff, ARP, length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.109.10 tell 10.1.109.2, length 28
ec:55:f9:c5:f4:12 > 08:57:00:4c:ef:ee, ARP, length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.1.109.10 is-at ec:55:f9:c5:f4:12, length 28
08:57:00:4c:ef:ee > ff:ff:ff:ff:ff:ff, ARP, length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.109.10 tell 10.1.109.2, length 28
ec:55:f9:c5:f4:12 > 08:57:00:4c:ef:ee, ARP, length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.1.109.10 is-at ec:55:f9:c5:f4:12, length 28

You can see the router (10.1.109.2) must have received a packet from ping. It wants to reply but doesn't know where to send this reply. It asks, "who-has 10.1.109.10?". The laptop has that IP and replies.

Only... for whatever reason the router clearly isn't getting this reply because it asks again and again and the 'ping' command reports 100% packet loss.

Notice that in this test the problem is that outbound ARP packets seem to be getting lost or not sent.

I have another router on the network that is the gateway to the outside world. I have hardcoded the address of the laptop in that router's ARP table, so that router never asks "who-has 10.1.109.10?". However, sometimes the laptop forgets, or cannot find out the address of this other router and packets to it are lost. I fix that particular problem by manually adding a static ARP address on the laptop.

Again - all this is for Linux only. Same network, same hardware works perfectly under Windows (I hate to admit it).

Comment 7 Robin Rainton 2016-02-26 19:24:15 UTC
Created attachment 1130892 [details]
tcpdump from Openwrt

Comment 8 Robin Rainton 2016-02-26 19:28:24 UTC
Created attachment 1130893 [details]
Wireshark capture from laptop

Comment 9 Robin Rainton 2016-02-26 19:45:44 UTC
I have managed to get tcpdump working on my OpenWRT router and created two packet captures that demonstrate this problem.

See attachments:

https://bugzilla.redhat.com/attachment.cgi?id=1130892 - This is from tcpdump on an OpenWrt router. IP 10.1.109.129, MAC 08:57:00:4c:ef:ee.

https://bugzilla.redhat.com/attachment.cgi?id=1130893 - Same approximate period captured in Wireshark on Linux laptop (4.3.5-300.fc23.x86_64). IP 10.1.109.222, MAC ec:55:f9:c5:f4:12.

The first of these is longer because the laptop took around 5 minutes to begin to misbehave and didn't send or receive any ARP packets in this time. I believe packet 77 in the first capture matches packet 51 in the second (Who has 10.1.109.222?  Tell 10.1.109.129).

Things to note:

- The laptop was associated with the OpenWRT router. The OpenWRT router had a few Android clients connected at the same time, as can be seen from it's capture. ARP was working just fine on those devices. All devices were in very close (less than 5 metres) proximity.

- Prior to the packet I mention the laptop lost communication with router.

- It can be seen that the laptop thinks it spat out 50, 'Who has 10.1.109.129' requests without reply. The first 3 to direct MAC of the router, then broadcast. None of these are seen in the tcpdump on the router.

- Something then causes the router to ask, 'Who has 10.1.109.222?  Tell 10.1.109.129'. Note the laptop capture shows this request, and an immediate response. However, looking at the router, that response is never received. All that is seen from the router is multiple requests 'Who has 10.1.109.222?  Tell 10.1.109.129'

- Finally (packet 90 on the router, 76 on the laptop) the response is received and traffic is re-established.

Comment 10 Robin Rainton 2016-02-27 11:15:59 UTC
I am beginning to form the opinion that this is a specific compatibility problem with this chipset/driver only when connecting to a HostAP router.

Right now the X220 laptop I have is connected to an OpenWRT router. It's rock solid. But... that's because I'm running Windows.

If I boot into Linux then this setup is very unstable. It will barely last a handful of minutes before packet loss occurs (this ARP problem) and in some cases the WiFi association is actually dropped.

I have access to another router though, and this laptop will connect to that and remain stable.

I have other devices (such as Android tablet & smartphone) that will connect to this OpenWRT router and remain stable.

It is only the combination of the X220 laptop (rtl8192ce) and OpenWRT router (TP-LINK TL-WR703N v1 with Atheros AR9330) that seems to cause this issue.

Comment 11 Robin Rainton 2016-02-27 17:27:39 UTC
Ahhhhhhh-ha! Solved!

It turns out that this problem is caused by setting the OpenWRT wireless option 'htmode' to 'HT20' in the router - this causes all the problems I have described above.

With this option at 'NONE' or 'HT40' the problems disappear and the network runs smoothly!

So it appears that the rtl8192ce driver or the X220 hardware is unable to work with a 802.11n channel width of 20Mhz. I would imagine there should be some way for it to tell the router this is the case and fallback to a setting equivalent to 'NONE' but... well... it doesn't appear that happens.

For the record 'lspci' shows the hardware is:

03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01)

Comment 12 Robin Rainton 2016-02-27 18:13:55 UTC
Scratch what I said about HT40 being stable. Sorry, it's not.

Comment 13 Laura Abbott 2016-09-23 19:52:10 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 14 Laura Abbott 2016-10-26 16:58:35 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 15 Red Hat Bugzilla 2023-09-14 03:03:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.