Bug 1435793 - Wireless intermittently disconnects and reconnects and finally completely fails
Summary: Wireless intermittently disconnects and reconnects and finally completely fails
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 26
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-24 20:01 UTC by Peter Gückel
Modified: 2017-04-30 01:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-30 01:34:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
An excerpt of var-log-messages (18.33 KB, text/plain)
2017-03-26 18:52 UTC, Peter Gückel
no flags Details

Description Peter Gückel 2017-03-24 20:01:58 UTC
Description of problem:
The wireless network constantly drops out, gets reconnected, drops out again, reconnects, ad infinitum. The connection stays active anywhere from a few seconds to a few minutes before dropping out again. After a while of this, perhaps an hour or so, the connection no longer reestablishes. I have tried restarting NetworkManager, logging out and in again, but nothing will get the wireless network connected again, except for a reboot, at which point the entire cycle begins anew.

Version-Release number of selected component (if applicable):
4.10.4-200.fc25.x86_64

I am using a laptop with Realtek RTL8821AE 802.11ac PCIe Wireless Network Adapter.

How reproducible:
Use wireless.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
This problem begin about a week ago. I use the updates-testing repo. It either began with the first of the 4.10 series kernels of perhaps already with the last or second-last of the 4.9 kernels.

I had a repairman from the telephone company here this morning. We verified that my blu-ray player, my cell phone and his computer work just fine (didn't have them on for very long, though). Nevertheless, it appears to only affect my laptop (the only device running Fedora). He took my old gateway/router and brought a new one that even supports 5G, so now I can connect to both 2.4G and to 5G, but the dropouts have continued on the laptop.

I searched google and there appears to be a longstanding history of exactly this behaviour, although I only first experienced it about 5-7 days ago.

One time, after the numerous dropout and reconnection cycles, when NetworkManager would no longer reconnect, I noticed a message appear stating that "authorization supplicant times out." A google search on this message turns up a lot of information, some of it suggesting that it is related to the wireless kernel module.

Comment 1 Peter Gückel 2017-03-24 20:23:12 UTC
Also:

The repairman verified that the 2.4G channels in my area were very congested. Still, things worked just fine until 5-7 days ago.

When he set up the alternate 5G network, he showed me on his laptop that now my laptop was on a channel all by itself and that there were only 2 other 5G users on another channel, thus no interference at ll... but still both my 2.4G and 5G wireless networks are plagued by this dropout problem.

Comment 2 Peter Gückel 2017-03-26 18:52:56 UTC
Created attachment 1266517 [details]
An excerpt of var-log-messages

This should give you a clear idea...

Comment 3 Peter Gückel 2017-04-10 00:32:49 UTC
Despite keeping up with all of the updates (I have disabled updates-testing and the released updates have now caught up), the problem persists.

There has been no change: network randomly drops out, restarts, drops again and finally is unstartable, requiring a reboot.

Comment 4 Peter Gückel 2017-04-10 00:43:54 UTC
I even installed the WiFi Analyzer app (VREM Software) on my phone and there is nobody on my channel. I am using 802.11ac in 5G with a 40MHz bandwidth.

Comment 5 Peter Gückel 2017-04-10 14:43:57 UTC
New developments:

1. I changed the version to 26! I installed 26 alpha to a separate partition, preserving 25, but when I tried to boot into 25, it was unable to mount /boot/efi (why? I have always done this and never previously had problems!). As a result, I am now using fedora 26 and the problem persists, worse than ever, since it is now not even restarting the wifi network: it just drops the signal and refuses to reconnect.

2. I just installed the testing update:

wpa_supplicant.x86_64 1:2.6-5.fc26

I have suspected that it might be an issue with the 'supplicant'. I will give this a try for a while.

In case you want to know: this computer came preinstalled with some version or other of Windows. I preserved the Windows restore and the special MSR partitions, but I have never run Windows on this computer and I do not have it installed on any partition.

So, we shall see what the new supplicant can do...

Comment 6 Peter Gückel 2017-04-10 14:58:06 UTC
That was fast! :-(

I rebooted the system after the upgrade of wpa_supplicant and was hopeful.

The problem persists in Fedora 26. NetworkManager.service manages to restart the dropped wifi connection, at least this one first time. It is looking like the same problem is continuing...

Comment 7 Peter Gückel 2017-04-13 03:13:30 UTC
I have been reading and have discovered what is likely the problem.

There is an extensive writeup here that describes the regression that corresponds to the beginning of the problem:

https://github.com/lwfinger/rtlwifi_new/issues/203

A few minutes ago, by wifi was completely unusable, as described in earlier posts, so, in the spirit of the conversation in the above link and without having understood or processed it in full, I simply ran, as root:

rmmod rtl8821ae
modprobe rtl8821ae ips=1

Voilà! -My wifi came back like a snap and I haven't had a dropout for 5 minutes!-

While this is not proof this is the (whole) solution, I am definitely on the right track. This and possibly other changes need to be made permanent and safe from updates.

While dnf provides /lib/firmware/rtlwifi/rtl8821aefw.bin was not able to tell me anything (Error: No Matches found), I think this is a kernel modules problem!!!

Comment 8 Peter Gückel 2017-04-13 03:47:46 UTC
Well, it only lasted about 25 minutes in all, but compared to before the ips=1 parameter, when I was only getting from about 5 seconds to 2 minutes of wifi, this is a big improvement.

So, are you guys there!?

Comment 9 Peter Gückel 2017-04-13 16:22:44 UTC
The new kernel-4.11.0-0.rc6.git0.1.fc26.x86_64 has not solved the problem.

Comment 10 Peter Gückel 2017-04-13 23:42:24 UTC
Maybe I spoke too soon? My wifi is definitely much more stable with this kernel! I have had some connectivity losses (I NEVER had any problem with this in years past, not until this all began about a month ago). I unloaded and reloaded the module with the parameter I indicated above. Did I even make it worse by changing the setting upon reconnecting? I am going to cease making any changes and cease reloading the module for a day or three so that I can determine what this new kernel has changed, if anything.

Comment 11 Peter Gückel 2017-04-14 17:00:36 UTC
I have experiemented steadfastly since last night and have not added any of the conflicting and useless parameter changes when doing so. I merely dropped and reloaded the kernel module to avoid having to reboot the computer. This has allowed me to at least be able to use the network, albeit with frustration.

Anyway, the current kernel has not solved the problem.

I looked at the github site for rtlwifi_new and found a lot of discussion and commits dating less than a week ago with reports that the problem is solved.

We need a kernel that is up-to-date in this regard to re-enable stable wifi.

Comment 12 Peter Gückel 2017-04-27 23:29:02 UTC
So, to recap...

(In reply to Thomas Haller from comment #33 of now closed bug 1382741)
> In general, if you still have issues with the wifi.scan-rand-mac-address=no
> setting, your issue is not the same as this bug.
> If you still have issues with 1.8.0-0.2.rc3.fc26, it is also likely not this
> bug because this bug is considered closed.
> 
> The logfile shows that after a while supplicant fails with
> 
>   <warn>  [1493313695.1804] sup-iface[0x55e426f49010,wlp3s0]: connection
> disconnected (reason -4)
> 
> which then leads to 
> 
>   <info>  [1493313710.6812] device (wlp3s0): state change: activated ->
> failed (reason 'ssid-not-found') [100 120 53]
> 
> 
> I am not sure what this means. Looks like a supplicant or driver issue.
> Getting logfiles for the supplicant would be helpful. See
> https://wiki.gnome.org/Projects/NetworkManager/
> Debugging#Debugging_wpa_supplicant_0.7_and_later

I ran the 2 commands indicated for wpa_supplicant 0.7+.

The first command returned:

method return time=1493335064.746597 sender=:1.17 -> destination=:1.211 serial=12520 reply_serial=2

The second command returned:

method return time=1493335099.693185 sender=:1.17 -> destination=:1.213 serial=12523 reply_serial=2

Do I have to rerun these commands every time I reboot the system or log out/in or suspend/resume? Or has logging been set until I disable it?

Again, I remind you that /etc/NetworkManager/NetworkManager.conf is still modified, with the addition of this section:

[device]
wifi.scan-rand-mac-address=no

Should I leave this or restore the file to the original state, since you feel that the bug I am battling is not the one that this parameter applied to?

Comment 13 Peter Gückel 2017-04-27 23:46:17 UTC
And, should I leave the NM level=TRACE set, too?

Now, what really does confound me is why this should be so difficult, since we already know that everything was working just perfectly in kernel 4.9. Obviously, someone made a change from the last of the 4.9 series to the 4.10 series that introduced this problem. It should be a simple matter to dump the bad code and put the good code back in!

And there is still the issue of the rtlwifi_new project on github. There are others who experienced EXACTLY the same problem I have, as far as I am able to determine from the reports.

Once again, I refer you to my bug report there and the other comments for your perusal. This, too, if one does not, for some reason, wish to dump bad code and replace it with good, should be a further clue as to what is happening. There was some discussion of a new Realtek firmware and the introduction of a regression, etc. that has apparently been resolved! So, why is it not in Fedora 26?

https://github.com/lwfinger/rtlwifi_new/issues

Comment 14 Peter Gückel 2017-04-29 05:32:25 UTC
HEADS UP!

Through extensive experimentation, I have discovered when the bug entered the kernels.

In Fedora 25, all the kernels up to and including 4.9.13-201 are good. All of the kernels from 4.9.14-200 and onward are BAD.

In Fedora 26, all of the kernels from the release of Fedora 26 alpha are BAD!

The WiFi works perfectly in Fedora 25 with the 4.9.13-20X kernels and earlier. The bug is present in all other kernels and wifi is barely useable.

Comment 15 Peter Gückel 2017-04-30 01:34:17 UTC
I am closing this bug, because all indications are that it is a driver issue (and nobody has even bothered to respond anyway). There have been some developments that make this problem less severe and the hoped-for driver rewrite (?) might solve this.


Note You need to log in before you can comment on or make changes to this bug.