Bug 743061

Summary: intel 4965 wireless drops connections
Product: [Fedora] Fedora Reporter: Mike Iglesias <iglesias>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14CC: gansalmon, itamar, jonathan, kernel-maint, linville, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-24 11:41:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/messages ffor intel 4965 failure
none
/var/log/messages entries showing wireless failure
none
/var/log/messages entries with full debug flags set none

Description Mike Iglesias 2011-10-03 17:56:29 UTC
Description of problem:

The Intel 4965 wireless interface on my Dell XPS M1330 drops off the network intermittently


Version-Release number of selected component (if applicable):

2.6.35.14-96.fc14.i686


The Intel 4965 wireless interface on m Dell XPS M1330 drops off the network intermittently, and either needs a reboot or a ifdown/ifup to recover the connection.

I've tried the following to solve this issue:

Replaced the 4965 with a new one

Tried various options in a modprobe config file:

  swcrypto=1
  11n_disable50=1
  11n_disable=1
  disable_hw_scan=1

Tried the compat-wireless package.  It will not start the interface on reboot, and if I try to start it manually via modprobe, I get this:

# /sbin/modprobe iwl4965
WARNING: Error inserting rfkill (/lib/modules/2.6.35.14-96.fc14.i686/kernel/net/rfkill/rfkill.ko): Invalid module format
WARNING: Error inserting cfg80211 (/lib/modules/2.6.35.14-96.fc14.i686/updates/net/wireless/cfg80211.ko): Invalid module format
WARNING: Error inserting mac80211 (/lib/modules/2.6.35.14-96.fc14.i686/updates/net/mac80211/mac80211.ko): Invalid module format
WARNING: Error inserting iwl_legacy (/lib/modules/2.6.35.14-96.fc14.i686/updates/drivers/net/wireless/iwlegacy/iwl-legacy.ko): Invalid module format
FATAL: Error inserting iwl4965 (/lib/modules/2.6.35.14-96.fc14.i686/updates/drivers/net/wireless/iwlegacy/iwl4965.ko): Invalid module format

The system was installed with the i386/i686 version of Fedora 14, so I don't think this is an architecture incompatibility.

I've attached an extract from /var/log/messages starting at the point where dhclient has renewed the lease, and within 5 minutes the interface drops off the network.  I had debug set to 0x43fff (via the modprobe config file mentioned above), so hopefully there's something useful there that may help with this.

Comment 1 John W. Linville 2011-10-03 18:39:32 UTC
The compat-wireless package needs some patching to work with the current F14 kernels.  Lucky for you, I've been working on a related project... :-)

First, you will need to edit /etc/depmod.d/dist.conf to replace the last line with a line like this:

   search updates extra backports built-in weak-updates

Then, you need to install _both_ the kernel and kernel-backports packages available from here:

   http://koji.fedoraproject.org/koji/taskinfo?taskID=3395348

That will let you test the compat-wireless version of the iwl4965 driver.

Comment 2 Mike Iglesias 2011-10-03 20:01:51 UTC
Created attachment 526118 [details]
/var/log/messages ffor intel 4965 failure

Comment 3 Mike Iglesias 2011-10-03 21:05:04 UTC
About 25 minutes after booting the new kernel+backports, it successfully renewed the lease from the dhcp server.  About 4 minutes later, it noted that it was calling CRDA to update world regulatory domain, updated the domain to US, and about a minute after that it was no longer able to talk to the dhcp server.  Maybe that has something to do with the problem?

Comment 4 Stanislaw Gruszka 2011-10-04 07:26:59 UTC
Are you from other country than US? Then perhaps setting proper regdomain could help, see https://bugzilla.redhat.com/show_bug.cgi?id=709803#c13

Comment 5 Mike Iglesias 2011-10-04 14:46:42 UTC
I'm from the US.  I looked thru the messages files on the laptop, and while it didn't happen every time it updated the regulatory domain, there are several occurrences where the wireless stopped working shortly after that took place.

Comment 6 Stanislaw Gruszka 2011-10-11 14:39:52 UTC
I've updated compat-wireless packages they should work now.

Mike, you checked latest code and the 2.6.35. Did you also check some other (older or somewhere between 2.6.35 and latest) version, is there any version on which driver works for you?

Comment 7 Mike Iglesias 2011-10-12 17:15:54 UTC
My feeling is that it's been a problem since I upgraded from Fedora 13 to Fedora 14.  I'm pretty sure 13 was much better than 14 has been.

I did some testing this morning and it appears that something tries to set the domain to US about 30 minutes after the system is booted, and at that point the wireless stops working.  I set up a script that did a "ping -c 5 ip-addr" and then slept for 60 seconds, and repeated that.  The set of pings just before the domain was set to US worked, and the pings after, as well as the dhcp lease renewal, did not go through.  I did this testing with the iwlagn debug flags set to 0x43fff, and I have extracted the information from /var/log/messages from about 2 minutes before the domain setting to some time after the wireless was not working.  I will upload that shortly in case you want to look at it.

Why would it need to set the domain to US after the system is up - it did it at least once while it was booting, so it seems to me it shouldn't need to do it again.

If it makes any difference, I'm not using NetworkManager.

Comment 8 Mike Iglesias 2011-10-12 17:18:08 UTC
Created attachment 527749 [details]
/var/log/messages entries showing wireless failure

Comment 9 Stanislaw Gruszka 2011-10-14 12:12:57 UTC
(In reply to comment #7)
> My feeling is that it's been a problem since I upgraded from Fedora 13 to
> Fedora 14.  I'm pretty sure 13 was much better than 14 has been.
That probably mean regression introduced between 2.6.34 and 2.6.35. 2.6.35 is well known from various iwlwifi regressions. Some of them are fixed now, some not. Some time ago I prepared 5 patches, which downgrade 2.6.35 iwlwifi driver to 2.6.34, to allow to narrow regression. They are attached to one bug report in bugzilla.kernel.org, but unfortunately that service is down. Not sure when it will be restarted.  So for now we can only debug this problem based on logs.

> set to 0x43fff, and I have extracted the information from /var/log/messages
> from about 2 minutes before the domain setting to some time after the wireless
> was not working.  I will upload that shortly in case you want to look at it.

I'm not sure why, but there is station removal
> Oct 12 10:02:57 dhcp-v041-206 kernel: [ 1835.887404] ieee80211 phy0: U iwl_mac_sta_remove received request to remove station 00:1e:79:d6:7d:02
before US regulatory domain set. That most likely is caused by user space application, which for some reason want to disconnect. However user space may want disconnect because of previous driver malfunction.
 
> If it makes any difference, I'm not using NetworkManager.
But you still use wpa_supplicant ? ;-) Please provide log from wpa_supplicant and kernel as described in "Configure syslog to debug kernel and wpa_supplicant"
at https://fedoraproject.org/wiki/DebugWireless

For iwlagn module use debug=0x7fffffff , what show much more detailed infromation. In case of big log file, compress it. Thanks.

Comment 10 Mike Iglesias 2011-10-14 19:44:15 UTC
There were no user-space applications running other than me being logged in on the laptop and the ping script I had running, so I don't know what might have triggered the station removal.

I'm not using wpa_supplicant, so there's no log for that.

I will try the iwlagn debugging setting next week when I'm back in the office.

Comment 11 Stanislaw Gruszka 2011-10-17 12:04:10 UTC
How do you configure connection? Could you show that, I'll check if I could reproduce problem locally.

Comment 12 Mike Iglesias 2011-10-17 16:07:16 UTC
I'm just using a standard ifcfg script in /etc/sysconfig/network-scripts.

DEVICE=wlan1
TYPE=Wireless
USERCTL=yes
IPV6INIT=no
BOOTPROTO=dhcp
ONBOOT=yes
ESSID="UCInet Mobile Access"

I'm going to upload an extract from /var/log/messages with the debug flags set to 0x7fffffff as you requested.

Comment 13 Mike Iglesias 2011-10-17 16:08:07 UTC
Created attachment 528579 [details]
/var/log/messages entries with full debug flags set

Comment 14 Stanislaw Gruszka 2011-10-18 11:30:02 UTC
It really looks like request for disconnect come from user space. Is NetworkManager not installed at all? If not, could you check if adding 
NM_CONTROLLED="no" to network-scrips help, and also this command:
"chkconfig NetworkManager off"

Comment 15 Stanislaw Gruszka 2011-10-18 11:38:16 UTC
Also "chkconfig wpa_supplicant off"

Is ESSID="UCInet Mobile Access" unprotected network?Or password is provided somehow. I'm sorry for stupid question, but this wireless interface  configuration method is totally unknown for me

Comment 16 Mike Iglesias 2011-10-18 22:27:22 UTC
NetworkManager and wpa_supplicant are not running, and adding NM_CONTROLLED="no" did not help.

I booted the laptop and did not log in, and almost exactly 30 minutes after it booted, it lost the wireless connection.  That would rule out anything in user space that is started when I log in as being the culprit.  I found this in the dmesg output:

[ 1836.069733] wlan1: deauthenticated from 00:1e:79:d6:7d:02 (Reason: 1)

What does a reason code of 1 mean?

One other interesting thing is that when the wireless connection dies, if I do

/sbin/ifdown wlan1
/sbin/ifup wlan1

it dies again 30 minutes after the /sbin/ifup, so it appears to be something related to starting the wireless connection.  The only thing I can think of that is started is dhclient.  I changed the way dhclient started so it was doing verbose logging (-v option) but that didn't reveal anything either.

Is there a way to determine what user space process might be doing this?

Comment 17 Stanislaw Gruszka 2011-10-19 08:36:15 UTC
(In reply to comment #16)
> [ 1836.069733] wlan1: deauthenticated from 00:1e:79:d6:7d:02 (Reason: 1)
> 
> What does a reason code of 1 mean?
It mean UNSPECIFIED :-( So not user space disconnect but AP, but we don't know why.

> it dies again 30 minutes after the /sbin/ifup, so it appears to be something
> related to starting the wireless connection. 

I'm not sure what could cause such behaviour.  Perhaps we do something wrong that make AP wants to deauthenticate us, or maybe this is configured behaviour of AP - default deauth after 30 minutes (if so, using wpa_supplicant could help with that issue as it should automaticly authenticate after deauthentication)

What is encryption used on that wireless network?

Comment 18 Mike Iglesias 2011-10-19 15:00:52 UTC
(In reply to comment #17)
> It mean UNSPECIFIED :-( So not user space disconnect but AP, but we don't know
> why.

I'll talk to our network engineer about this and see if he has any idea why this might be happening.

> I'm not sure what could cause such behaviour.  Perhaps we do something wrong
> that make AP wants to deauthenticate us, or maybe this is configured behaviour
> of AP - default deauth after 30 minutes (if so, using wpa_supplicant could help
> with that issue as it should automaticly authenticate after deauthentication)
> 
> What is encryption used on that wireless network?

We use MAC address authorization (people need to register the wireless MAC address to be allowed on the network), and we're not using encryption at this time.

Comment 19 Mike Iglesias 2011-10-19 16:27:37 UTC
According to our network engineer, the APs are setup with a session timeout that requires the clients to reauthenticate every 30 minutes.  My laptop has Vista on it and it seems to work ok when Vista is running.  The network engineer did tell me that some Mac OS-X systems seem to have trouble with this too, but for the most part clients are working ok with that setting on the APs.

Comment 20 Stanislaw Gruszka 2011-10-24 11:41:02 UTC
Ok, I see. We should handle such situation in user space. I think wpa_supplicant is able to associate automatically once AP disassociate. It can be configured for unencrypted network like that:

ctrl_interface=/var/run/wpa_supplicant
network={
  ssid="UCInet Mobile Access"
  key_mgmt=NONE
}

Otherwise NetworkManager should handle that. If that will not work, reopen bug and change to proper component.