Created attachment 852485 [details] dmesg full - until WIFI fail Description of problem: I'm discovering WIFI drops on a Lenovo T440s with Intel 7260 WIFI hardware. It occurs when regulary using the device (i.e. downloading a larger file can trigger the problem). Other devices on the network are still happily connected to the AP. Partly there's no reconnection possible. I've to reboot the system. Version-Release number of selected component (if applicable): kernel-3.12.7-300.fc20.x86_64 How reproducible: Download the Fedora installation ISO on the mentioned hardware. Steps to Reproduce: 1. Open Firefox and point it to the fedora project website 2. Download Fedora ISO. Actual results: Download hangs, Network manager show a broken network icon. The currently selected AP is no more on the list of available networks. Sometimes it recovers after a couple of minutes. Sometimes you've to reboot to get WIFI back working. Expected results: The download completes. Additional info: I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla (see external bug) and tried the patch against the Fedora kernel version mentioned above. Results: Short version - it does not fix the problem. Long version - the behaviour changes. You can see Firmware errors in the dmesg output noting "requesting hardware reset". The only difference with the patch is, that the WIFI connection always recovers after some minutes. No need to reboot. Let me know if I can provide more information.
Created attachment 852486 [details] lspci -vv
Looks like the fedora shipped firmware version was too old. I've manually tried the latest version (22.1.7.0) manually and it seems to work.
An update after some usage: The firmware update alone doesn't seem to help for long. Tried to load something bigger and the connections start to drop again. [13498.457001] cfg80211: Calling CRDA to update world regulatory domain [13498.462917] cfg80211: World regulatory domain updated: [13498.462921] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [13498.462923] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13498.462924] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13498.462926] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [13498.462927] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13498.462928] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13502.250815] wlp3s0: authenticate with c0:25:06:66:d0:fe [13502.252844] wlp3s0: send auth to c0:25:06:66:d0:fe (try 1/3) [13502.256125] wlp3s0: authenticated [13502.257122] wlp3s0: associate with c0:25:06:66:d0:fe (try 1/3) [13502.261240] wlp3s0: RX AssocResp from c0:25:06:66:d0:fe (capab=0x431 status=0 aid=4) [13502.266195] wlp3s0: associated [13776.224576] cfg80211: Calling CRDA to update world regulatory domain [13776.227398] cfg80211: World regulatory domain updated: [13776.227401] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [13776.227402] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13776.227403] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13776.227404] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [13776.227405] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13776.227406] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [13803.346459] icmp6_send: no reply to icmp error [13807.904505] icmp6_send: no reply to icmp error ...
Moving back to kernel for comment #3. The firmware will be updated in fedora under bug 1046935
On kernel.org bug there is patch , but I'm not sure if you are hitting the same issue. What show "lspci -vxxx" when the problem occurs ?
Created attachment 857963 [details] lspci -vxxx directly after boot
Created attachment 857964 [details] lspci -vxxx after wifi dropped
You mean the #64541 from bugzilla.kernel.org that I referenced initially? I tried the stock fedora kernel with the patch applied but I can't confirm that this makes things rock. They suggested to put options iwlmvm power_scheme=1 into /etc/modprobe.d/iwlwifi.conf but that didn't help either.
(In reply to Cajus Pollmeier from comment #8) > You mean the #64541 from bugzilla.kernel.org that I referenced initially? Yes, but you have different problem with iwlwifi.
Ok. For the record - I had the lspci stuff attached. If you want me to go the the iwlwifi list or try something, let me know.
The dmesg output you sent show that you use an old firmware. Can you please send the dmesg output once you'll have updated the firmware?
BTW - this bug has nothing to do with http://bugzilla.kernel.org/show_bug.cgi?id=64541.
> Additional info: > > I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla > (see external bug) and tried the patch against the Fedora kernel version > mentioned above. Results: > > Short version - it does not fix the problem. > > Long version - the behaviour changes. You can see Firmware errors in the > dmesg output noting "requesting hardware reset". The only difference with > the patch is, that the WIFI connection always recovers after some minutes. > No need to reboot. > in the dmesg attached, there is no firmware error, and you can grep "requesting hardware reset" but you won't get anything...
Created attachment 859294 [details] dmesg with new firmware
(In reply to Emmanuel Grumbach from comment #13) > > Additional info: > > > > I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla > > (see external bug) and tried the patch against the Fedora kernel version > > mentioned above. Results: > > > > Short version - it does not fix the problem. > > > > Long version - the behaviour changes. You can see Firmware errors in the > > dmesg output noting "requesting hardware reset". The only difference with > > the patch is, that the WIFI connection always recovers after some minutes. > > No need to reboot. > > > > in the dmesg attached, there is no firmware error, and you can grep > "requesting hardware reset" but you won't get anything... The text says that I *tried* with the patch. The log is against stock Fedora Kernels - as is the dmesg. Sorry if this was unclear. Just attached the dmesg against the recent firmware.
Created attachment 859306 [details] dmesg full - until WIFI fail - new firmware
from what I see here, you get a beacon from an AP that hints you something and the CRDA kicks in. On what channel is your AP? Do you have the ability to use tracing? I am pretty sure this is not an bug in our device. Stanislaw, do you know the regulatory code? I can dive in, but....
the purpose of tracing can check what beacons you get.
My WLAN in on channel 1 and it's provided by a FritzBox. It looks like someone has configured it to choose the channel automatically. There are three additional devices attached to the WLAN and they work fine. Tracing: what kind of packets/protocols do you want me to look for?
Created attachment 859361 [details] trace-cmd record -e mac80211 -e cfg80211
oh so you are (In reply to Cajus Pollmeier from comment #19) > My WLAN in on channel 1 and it's provided by a FritzBox. It looks like > someone has configured it to choose the channel automatically. There are > three additional devices attached to the WLAN and they work fine. > Ok - so you are saying that we have frequent channel switches.. I see.. Need to check then.
I changed the configuration to be on a fixed channel (6). The connection drops with this setup, too. The attached log is for channel 1 (FritzBox auto mode). I can attach one for the fixed channel setup too if you want me to.
yes please. I'd like to see what happens on fixed channel setup. Thanks.
Created attachment 859385 [details] trace-cmd record -e mac80211 -e cfg80211 - fixed channel
I don't think this is related to channel switches. Automatic channel selection usually just means that when the AP is started, it does a scan to see which is the least crowded channel. Most of the consumer-grade APs I know do not perform any checks after starting to switch to a better channel.
Seems to be related to scanning? 1422.695949: api_scan_completed: phy0 aborted:0 1422.695981: drv_hw_scan: phy0 vif:wlp3s0(2) 1422.723949: drv_return_int: phy0 - 0 1423.206107: api_beacon_loss: vif:wlp3s0(2) 1423.308604: api_beacon_loss: vif:wlp3s0(2) 1423.411117: api_beacon_loss: vif:wlp3s0(2) [...] Apparently a scan completes and we immediately start a new one (the drv_hw_scan), then we get lots of beacon loss events. After a while we seem to disconnect. Both traces show the same thing.
Just to be sure (even though Emmanuel says the bug doesn't have anything to do with http://bugzilla.kernel.org/show_bug.cgi?id=64541), could you try both the patch to fix that bug *and* the new firmware? From your comments it doesn't seem you tried both together. Also, could you please provide both dmesg and the traces from the same run? Then I can see them in sync.
I didn't try both together, because I noticed the new firmware too late. So you're right - I tried the stock Fedora 20 kernel with the stock Fedora firmware and the patched kernel with the stock Fedora firmware. Somehow I lost the logs for the last run and just left the vague comments in my initial report. Will rebuild and test with the patch and the current firmware in ~10 hours. I've no access to the machine in the moment.
If it's too much trouble to add the patch, it's okay, you can leave it out. But please send the dmesg and trace-cmd from the same run. And also add iwlwifi events to trace-cmd: trace-cmd record -e mac80211 -e cfg80211 -e iwlwifi -e iwlwifi_msg
I tried to add -e iwlwifi and -e iwlwifi_msg yesterday, but it was not possible. I'm not remembering the correct wording, but it was basically: there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not compiled with IWLWIFI_DEVICE_TRACING. Will check that when rebuilding the kernel with the patch and attach the logs when I'm back home.
(In reply to Cajus Pollmeier from comment #30) > there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not > compiled with IWLWIFI_DEVICE_TRACING. That option is enabled only on fedora kernel-debug variant.
Okay, it makes (In reply to Stanislaw Gruszka from comment #31) > (In reply to Cajus Pollmeier from comment #30) > > there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not > > compiled with IWLWIFI_DEVICE_TRACING. > That option is enabled only on fedora kernel-debug variant. Okay, it makes sense. Could you try with either the kernel-debug variant or compile the kernel yourself with that option?
Created attachment 859881 [details] dmesg full, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch
Created attachment 859882 [details] lspci -vxxx after boot, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch
Created attachment 859883 [details] lspci -vxxx after fail, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch
The trace is - well - big. I can't upload it to the bugtracker (~450M). You can find it bzip2'ed here: http://ferdi.naasa.net/trace.dat.bz2 After that drop I was only able to reconnect to the FritzBox after pulling the power plug (of the FritzBox). I tried to download a bigger file after that and there was no drop. At least while transferring the some 100M. Maybe its just a problem on the AP side? Hmm. Will check tomorrow.
Thanks for all the logs! This really seems to be some problem with the scanning. Most of the times, when we scan, it completes after <1 sec (for 2.4GHz) and <4 secs (for 5GHz). But sometimes the 5GHz scan takes ~14 secs and we miss lots of beacons during that period. This causes the connection to drop. I have asked the firmware team for help and they promised to provide a firmware with debugging information to try to figure out what is going on. But this may take a while... We have never seen this bug before, so there's probably something that your AP is doing that is causing the scans to get stuck. Maybe it's a bug in the AP that triggers a bug in the firmware. I don't know. One thing that sometimes helps, is to disable powersave, by loading the iwlmvm module with power_scheme=1. You can do that either when loading the module or by adding this to the modprobe configuration (in /etc/modprobe.d/iwlmvm or something): options iwlmvm power_scheme=1 You could also try to see if the problem goes away if you use the backport project to install the latest version of the wireless subsystem and use the newer firmware: https://backports.wiki.kernel.org/index.php/Main_Page
I had the power_scheme=1 option enabled before I did the last logs, so it didn't seem to help. I'll see if I can do something with the backport project, but I'll check some AP related things before: will take a second laptop from work in order to see if the AP is doing something nasty when the drops occurs. It's always bad if there's only one real computer to do debugging available...
It looks like I'm facing two different problems: internet communication WLAN <-> DSL and LAN communication WLAN <-> LAN. The first seems to be resolved with the new firmware. That's why I noticed that the firmware solves the problem in the very beginning of this thread. When claiming that it's still not fixed later on, it seems to be the second flavor: communication between the 7260 <-> AP <-> NAS. After monitoring the AP with a second computer from the LAN port, I noticed that the LAN connection was gone, too. Doh. The AP is completely offline. After digging for exactly this problem, I found that I'm not the only one having problems with WLAN <-> LAN transfers on the FritzBox: the AP is just doing a reset because it's overloaded (maybe heat, power consumption). I'm really sorry for all the traffic that happened after comment #2. I guess it can be closed. Sometimes google foo doesn't help if you don't search for the right stuff :-( Thanks for your help!
Ok, good to know that you found out what the problem is. And even better, that it's not a problem with our driver. :) I had a similar problem with another AP/ADSL combo (a Zyxel, IIRC). The AP would get stuck on heavy load, especially when too many connections were open (ie. downloading torrents). At some point I figured out that changing the encryption mode helped. You may want to experiment with that, if buying another AP is not an option for you. ;) Stanislaw, I think this bug can be closed, since the firmware update is already handled on a different bug report.
Ok, closing now, thanks for working on it!