Bug 1055249 - Intel Wireless 7260 drop connection
Summary: Intel Wireless 7260 drop connection
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: fedora-kernel-wireless-iwl
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-19 21:12 UTC by Cajus Pollmeier
Modified: 2014-02-07 12:32 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-02-07 12:32:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg full - until WIFI fail (72.81 KB, text/plain)
2014-01-19 21:12 UTC, Cajus Pollmeier
no flags Details
lspci -vv (21.39 KB, text/plain)
2014-01-19 21:13 UTC, Cajus Pollmeier
no flags Details
lspci -vxxx directly after boot (20.07 KB, text/plain)
2014-01-31 20:08 UTC, Cajus Pollmeier
no flags Details
lspci -vxxx after wifi dropped (20.07 KB, text/plain)
2014-01-31 20:09 UTC, Cajus Pollmeier
no flags Details
dmesg with new firmware (72.13 KB, text/plain)
2014-02-04 18:59 UTC, Cajus Pollmeier
no flags Details
dmesg full - until WIFI fail - new firmware (75.95 KB, text/plain)
2014-02-04 19:13 UTC, Cajus Pollmeier
no flags Details
trace-cmd record -e mac80211 -e cfg80211 (4.11 MB, application/octet-stream)
2014-02-04 20:53 UTC, Cajus Pollmeier
no flags Details
trace-cmd record -e mac80211 -e cfg80211 - fixed channel (4.26 MB, application/octet-stream)
2014-02-04 21:19 UTC, Cajus Pollmeier
no flags Details
dmesg full, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch (75.33 KB, text/plain)
2014-02-05 21:20 UTC, Cajus Pollmeier
no flags Details
lspci -vxxx after boot, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch (20.07 KB, text/plain)
2014-02-05 21:21 UTC, Cajus Pollmeier
no flags Details
lspci -vxxx after fail, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch (20.07 KB, text/plain)
2014-02-05 21:21 UTC, Cajus Pollmeier
no flags Details

Description Cajus Pollmeier 2014-01-19 21:12:35 UTC
Created attachment 852485 [details]
dmesg full - until WIFI fail

Description of problem:

I'm discovering WIFI drops on a Lenovo T440s with Intel 7260 WIFI hardware. It occurs when regulary using the device (i.e. downloading a larger file can trigger the problem). Other devices on the network are still happily connected to the AP.

Partly there's no reconnection possible. I've to reboot the system.


Version-Release number of selected component (if applicable):

kernel-3.12.7-300.fc20.x86_64


How reproducible:

Download the Fedora installation ISO on the mentioned hardware.


Steps to Reproduce:
1. Open Firefox and point it to the fedora project website
2. Download Fedora ISO.

Actual results:

Download hangs, Network manager show a broken network icon. The currently selected AP is no more on the list of available networks. Sometimes it recovers after a couple of minutes. Sometimes you've to reboot to get WIFI back working.


Expected results:

The download completes.


Additional info:

I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla (see external bug) and tried the patch against the Fedora kernel version mentioned above. Results:

Short version - it does not fix the problem.

Long version - the behaviour changes. You can see Firmware errors in the dmesg output noting "requesting hardware reset". The only difference with the patch is, that the WIFI connection always recovers after some minutes. No need to reboot.

Let me know if I can provide more information.

Comment 1 Cajus Pollmeier 2014-01-19 21:13:22 UTC
Created attachment 852486 [details]
lspci -vv

Comment 2 Cajus Pollmeier 2014-01-19 21:39:37 UTC
Looks like the fedora shipped firmware version was too old. I've manually tried the latest version (22.1.7.0) manually and it seems to work.

Comment 3 Cajus Pollmeier 2014-01-25 06:54:48 UTC
An update after some usage:

The firmware update alone doesn't seem to help for long. Tried to load something bigger and the connections start to drop again.

[13498.457001] cfg80211: Calling CRDA to update world regulatory domain
[13498.462917] cfg80211: World regulatory domain updated:
[13498.462921] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[13498.462923] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13498.462924] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13498.462926] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[13498.462927] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13498.462928] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13502.250815] wlp3s0: authenticate with c0:25:06:66:d0:fe
[13502.252844] wlp3s0: send auth to c0:25:06:66:d0:fe (try 1/3)
[13502.256125] wlp3s0: authenticated
[13502.257122] wlp3s0: associate with c0:25:06:66:d0:fe (try 1/3)
[13502.261240] wlp3s0: RX AssocResp from c0:25:06:66:d0:fe (capab=0x431 status=0 aid=4)
[13502.266195] wlp3s0: associated
[13776.224576] cfg80211: Calling CRDA to update world regulatory domain
[13776.227398] cfg80211: World regulatory domain updated:
[13776.227401] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[13776.227402] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13776.227403] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13776.227404] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[13776.227405] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13776.227406] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[13803.346459] icmp6_send: no reply to icmp error
[13807.904505] icmp6_send: no reply to icmp error
...

Comment 4 Josh Boyer 2014-01-31 12:07:10 UTC
Moving back to kernel for comment #3.  The firmware will be updated in fedora under bug 1046935

Comment 5 Stanislaw Gruszka 2014-01-31 13:45:51 UTC
On kernel.org bug there is patch , but I'm not sure if you are hitting the same issue. What show "lspci -vxxx" when the problem occurs ?

Comment 6 Cajus Pollmeier 2014-01-31 20:08:26 UTC
Created attachment 857963 [details]
lspci -vxxx directly after boot

Comment 7 Cajus Pollmeier 2014-01-31 20:09:08 UTC
Created attachment 857964 [details]
lspci -vxxx after wifi dropped

Comment 8 Cajus Pollmeier 2014-01-31 20:18:31 UTC
You mean the #64541 from bugzilla.kernel.org that I referenced initially? I tried the stock fedora kernel with the patch applied but I can't confirm that this makes things rock.

They suggested to put

options iwlmvm power_scheme=1

into /etc/modprobe.d/iwlwifi.conf but that didn't help either.

Comment 9 Stanislaw Gruszka 2014-02-04 12:13:30 UTC
(In reply to Cajus Pollmeier from comment #8)
> You mean the #64541 from bugzilla.kernel.org that I referenced initially? 
Yes, but you have different problem with iwlwifi.

Comment 10 Cajus Pollmeier 2014-02-04 12:17:45 UTC
Ok. For the record - I had the lspci stuff attached. If you want me to go the the iwlwifi list or try something, let me know.

Comment 11 Emmanuel Grumbach 2014-02-04 12:35:41 UTC
The dmesg output you sent show that you use an old firmware.
Can you please send the dmesg output once you'll have updated the firmware?

Comment 12 Emmanuel Grumbach 2014-02-04 14:05:35 UTC
BTW - this bug has nothing to do with http://bugzilla.kernel.org/show_bug.cgi?id=64541.

Comment 13 Emmanuel Grumbach 2014-02-04 18:02:49 UTC
> Additional info:
> 
> I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla
> (see external bug) and tried the patch against the Fedora kernel version
> mentioned above. Results:
> 
> Short version - it does not fix the problem.
> 
> Long version - the behaviour changes. You can see Firmware errors in the
> dmesg output noting "requesting hardware reset". The only difference with
> the patch is, that the WIFI connection always recovers after some minutes.
> No need to reboot.
> 

in the dmesg attached, there is no firmware error, and you can grep "requesting hardware reset" but you won't get anything...

Comment 14 Cajus Pollmeier 2014-02-04 18:59:37 UTC
Created attachment 859294 [details]
dmesg with new firmware

Comment 15 Cajus Pollmeier 2014-02-04 19:01:50 UTC
(In reply to Emmanuel Grumbach from comment #13)
> > Additional info:
> > 
> > I've attached dmesg and lspci output. I found a bug on the Kernel Bugzilla
> > (see external bug) and tried the patch against the Fedora kernel version
> > mentioned above. Results:
> > 
> > Short version - it does not fix the problem.
> > 
> > Long version - the behaviour changes. You can see Firmware errors in the
> > dmesg output noting "requesting hardware reset". The only difference with
> > the patch is, that the WIFI connection always recovers after some minutes.
> > No need to reboot.
> > 
> 
> in the dmesg attached, there is no firmware error, and you can grep
> "requesting hardware reset" but you won't get anything...

The text says that I *tried* with the patch. The log is against stock Fedora Kernels - as is the dmesg. Sorry if this was unclear.

Just attached the dmesg against the recent firmware.

Comment 16 Cajus Pollmeier 2014-02-04 19:13:19 UTC
Created attachment 859306 [details]
dmesg full - until WIFI fail - new firmware

Comment 17 Emmanuel Grumbach 2014-02-04 19:24:22 UTC
from what I see here, you get a beacon from an AP that hints you something and the CRDA kicks in.

On what channel is your AP?
Do you have the ability to use tracing?
I am pretty sure this is not an bug in our device.
Stanislaw, do you know the regulatory code? I can dive in, but....

Comment 18 Emmanuel Grumbach 2014-02-04 19:26:05 UTC
the purpose of tracing can check what beacons you get.

Comment 19 Cajus Pollmeier 2014-02-04 20:37:04 UTC
My WLAN in on channel 1 and it's provided by a FritzBox. It looks like someone has configured it to choose the channel automatically. There are three additional devices attached to the WLAN and they work fine.

Tracing: what kind of packets/protocols do you want me to look for?

Comment 20 Cajus Pollmeier 2014-02-04 20:53:41 UTC
Created attachment 859361 [details]
trace-cmd record -e mac80211 -e cfg80211

Comment 21 Emmanuel Grumbach 2014-02-04 20:55:32 UTC
oh so you are (In reply to Cajus Pollmeier from comment #19)
> My WLAN in on channel 1 and it's provided by a FritzBox. It looks like
> someone has configured it to choose the channel automatically. There are
> three additional devices attached to the WLAN and they work fine.
> 

Ok - so you are saying that we have frequent channel switches.. I see..
Need to check then.

Comment 22 Cajus Pollmeier 2014-02-04 21:13:43 UTC
I changed the configuration to be on a fixed channel (6). The connection drops with this setup, too.

The attached log is for channel 1 (FritzBox auto mode). I can attach one for the fixed channel setup too if you want me to.

Comment 23 Emmanuel Grumbach 2014-02-04 21:15:52 UTC
yes please.
I'd like to see what happens on fixed channel setup.
Thanks.

Comment 24 Cajus Pollmeier 2014-02-04 21:19:08 UTC
Created attachment 859385 [details]
trace-cmd record -e mac80211 -e cfg80211 - fixed channel

Comment 25 Luca Coelho 2014-02-04 22:01:06 UTC
I don't think this is related to channel switches.  Automatic channel selection usually just means that when the AP is started, it does a scan to see which is the least crowded channel.  Most of the consumer-grade APs I know do not perform any checks after starting to switch to a better channel.

Comment 26 Luca Coelho 2014-02-04 22:23:49 UTC
Seems to be related to scanning?

1422.695949: api_scan_completed:   phy0 aborted:0
1422.695981: drv_hw_scan:          phy0 vif:wlp3s0(2)
1422.723949: drv_return_int:       phy0 - 0
1423.206107: api_beacon_loss:       vif:wlp3s0(2)
1423.308604: api_beacon_loss:       vif:wlp3s0(2)
1423.411117: api_beacon_loss:       vif:wlp3s0(2)
[...]

Apparently a scan completes and we immediately start a new one (the drv_hw_scan), then we get lots of beacon loss events.  After a while we seem to disconnect.

Both traces show the same thing.

Comment 27 Luca Coelho 2014-02-05 07:56:35 UTC
Just to be sure (even though Emmanuel says the bug doesn't have anything to do with http://bugzilla.kernel.org/show_bug.cgi?id=64541), could you try both the patch to fix that bug *and* the new firmware? From your comments it doesn't seem you tried both together.

Also, could you please provide both dmesg and the traces from the same run? Then I can see them in sync.

Comment 28 Cajus Pollmeier 2014-02-05 08:09:48 UTC
I didn't try both together, because I noticed the new firmware too late. So you're right - I tried the stock Fedora 20 kernel with the stock Fedora firmware and the patched kernel with the stock Fedora firmware. Somehow I lost the logs for the last run and just left the vague comments in my initial report.

Will rebuild and test with the patch and the current firmware in ~10 hours. I've no access to the machine in the moment.

Comment 29 Luca Coelho 2014-02-05 11:30:15 UTC
If it's too much trouble to add the patch, it's okay, you can leave it out.

But please send the dmesg and trace-cmd from the same run.  And also add iwlwifi events to trace-cmd:

trace-cmd record -e mac80211 -e cfg80211 -e iwlwifi -e iwlwifi_msg

Comment 30 Cajus Pollmeier 2014-02-05 11:34:59 UTC
I tried to add -e iwlwifi and -e iwlwifi_msg yesterday, but it was not possible. I'm not remembering the correct wording, but it was basically: there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not compiled with IWLWIFI_DEVICE_TRACING.

Will check that when rebuilding the kernel with the patch and attach the logs when I'm back home.

Comment 31 Stanislaw Gruszka 2014-02-05 12:30:45 UTC
(In reply to Cajus Pollmeier from comment #30)
> there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not
> compiled with IWLWIFI_DEVICE_TRACING.
That option is enabled only on fedora kernel-debug variant.

Comment 32 Luca Coelho 2014-02-05 13:19:22 UTC
Okay, it makes (In reply to Stanislaw Gruszka from comment #31)
> (In reply to Cajus Pollmeier from comment #30)
> > there's no foobar hook available in iwlwifi. Maybe the Fedora kernel is not
> > compiled with IWLWIFI_DEVICE_TRACING.
> That option is enabled only on fedora kernel-debug variant.

Okay, it makes sense.

Could you try with either the kernel-debug variant or compile the kernel yourself with that option?

Comment 33 Cajus Pollmeier 2014-02-05 21:20:18 UTC
Created attachment 859881 [details]
dmesg full, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch

Comment 34 Cajus Pollmeier 2014-02-05 21:21:14 UTC
Created attachment 859882 [details]
lspci -vxxx after boot, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch

Comment 35 Cajus Pollmeier 2014-02-05 21:21:51 UTC
Created attachment 859883 [details]
lspci -vxxx after fail, new firmware 3.12.9-301.fc20.x86_64+debug+osc_clk.patch

Comment 36 Cajus Pollmeier 2014-02-05 22:29:41 UTC
The trace is - well - big. I can't upload it to the bugtracker (~450M). You can find it bzip2'ed here:

http://ferdi.naasa.net/trace.dat.bz2

After that drop I was only able to reconnect to the FritzBox after pulling the power plug (of the FritzBox). I tried to download a bigger file after that  and there was no drop. At least while transferring the some 100M. Maybe its just a problem on the AP side? Hmm. Will check tomorrow.

Comment 37 Luca Coelho 2014-02-06 09:15:07 UTC
Thanks for all the logs!

This really seems to be some problem with the scanning.  Most of the times, when we scan, it completes after <1 sec (for 2.4GHz) and <4 secs (for 5GHz).  But sometimes the 5GHz scan takes ~14 secs and we miss lots of beacons during that period.  This causes the connection to drop.

I have asked the firmware team for help and they promised to provide a firmware with debugging information to try to figure out what is going on.  But this may take a while...

We have never seen this bug before, so there's probably something that your AP is doing that is causing the scans to get stuck.  Maybe it's a bug in the AP that triggers a bug in the firmware.  I don't know.

One thing that sometimes helps, is to disable powersave, by loading the iwlmvm module with power_scheme=1.  You can do that either when loading the module or by adding this to the modprobe configuration (in /etc/modprobe.d/iwlmvm or something):

options iwlmvm power_scheme=1

You could also try to see if the problem goes away if you use the backport project to install the latest version of the wireless subsystem and use the newer firmware: https://backports.wiki.kernel.org/index.php/Main_Page

Comment 38 Cajus Pollmeier 2014-02-06 09:32:27 UTC
I had the power_scheme=1 option enabled before I did the last logs, so it didn't seem to help.

I'll see if I can do something with the backport project, but I'll check some AP related things before: will take a second laptop from work in order to see if the AP is doing something nasty when the drops occurs. It's always bad if there's only one real computer to do debugging available...

Comment 39 Cajus Pollmeier 2014-02-06 21:41:04 UTC
It looks like I'm facing two different problems: internet communication WLAN <-> DSL and LAN communication WLAN <-> LAN.

The first seems to be resolved with the new firmware. That's why I noticed that the firmware solves the problem in the very beginning of this thread.

When claiming that it's still not fixed later on, it seems to be the second flavor: communication between the 7260 <-> AP <-> NAS. After monitoring the AP with a second computer from the LAN port, I noticed that the LAN connection was gone, too. Doh. The AP is completely offline.

After digging for exactly this problem, I found that I'm not the only one having problems with WLAN <-> LAN transfers on the FritzBox: the AP is just doing a reset because it's overloaded (maybe heat, power consumption).

I'm really sorry for all the traffic that happened after comment #2. I guess it can be closed. Sometimes google foo doesn't help if you don't search for the right stuff :-(

Thanks for your help!

Comment 40 Luca Coelho 2014-02-07 08:12:58 UTC
Ok, good to know that you found out what the problem is.  And even better, that it's not a problem with our driver. :)

I had a similar problem with another AP/ADSL combo (a Zyxel, IIRC).  The AP would get stuck on heavy load, especially when too many connections were open (ie. downloading torrents).  At some point I figured out that changing the encryption mode helped.  You may want to experiment with that, if buying another AP is not an option for you. ;)

Stanislaw, I think this bug can be closed, since the firmware update is already handled on a different bug report.

Comment 41 Stanislaw Gruszka 2014-02-07 12:32:10 UTC
Ok, closing now, thanks for working on it!


Note You need to log in before you can comment on or make changes to this bug.