Description of problem: 1) Montevina fails to get associated with keyed(WEP64/128,WPA/WPA2) or un-keye APs by KNetworkManager frequently, we captured issue logs for you reference. 2) We can get Montevina wireless card associated with (WEP64/128,WPA/WPA2) or un-keye APs successfully by commands (such as iwconfig/wpa_supplicant/dhclient),while we soon find that eth0 connection is shutdown after wireless successful association. Version-Release number of selected component (if applicable): OS: Red Hat Enterprise Linux Client release 5.3 Beta (Tikanga) Kernel version: Linux osve-mv 2.6.18-118.el5 #1 SMP Sat Oct 4 00:21:30 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux HW Platform: Desktop with Montevina Wireless Card Firmware:iwlwifi-5000-1.ucode NetworkManager Version: NetworkManager-gnome-0.7.0-0.11.svn4133.el5 NetworkManager-0.7.0-0.11.svn4133.el5 NetworkManager-glib-0.7.0-0.11.svn4133.el5 NetworkManager-glib-0.7.0-0.11.svn4133.el5 NetworkManager-0.7.0-0.11.svn4133.el5 as far as we can rememeber previous NetworkManager version 0.6.4-8 works with RHEL 5.2. How reproducible: we can reproduce this issue freque frequently. Steps to Reproduce : 1) Un-Keyed AP 1.1) start Desktop with Montevina wireless card 1.2) launch NetworkManager to get associated with nearby un-keyed AP (Network Mode: Wireless-G only),after sometime it does not get associated with AP. 1.3) log info is as follows from /var/log/messages: Oct 29 03:05:17 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) starting... Oct 29 03:05:17 osve-mv NetworkManager: <info> (wlan0): device state change: 4 -> 5 Oct 29 03:05:17 osve-mv NetworkManager: <info> Activation (wlan0/wireless): connection 'Auto jane-linksys' requires no security. No secrets needed. Oct 29 03:05:17 osve-mv NetworkManager: <info> Config: added 'ssid' value 'jane-linksys' Oct 29 03:05:17 osve-mv NetworkManager: <info> Config: added 'scan_ssid' value '1' Oct 29 03:05:17 osve-mv NetworkManager: <info> Config: added 'key_mgmt' value 'NONE' Oct 29 03:05:17 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Oct 29 03:05:17 osve-mv NetworkManager: <info> Config: set interface ap_scan to 1 Oct 29 03:05:17 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:05:17 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:05:32 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 03:05:42 osve-mv NetworkManager: <info> Activation (wlan0/wireless): association took too long, failing activation. Oct 29 03:05:42 osve-mv NetworkManager: <info> (wlan0): device state change: 5 -> 9 Oct 29 03:05:42 osve-mv NetworkManager: <info> Activation (wlan0) failed for access point (jane-linksys) Oct 29 03:05:42 osve-mv NetworkManager: <info> Marking connection 'Auto jane-linksys' invalid. Oct 29 03:05:42 osve-mv NetworkManager: <info> Activation (wlan0) failed. Oct 29 03:05:42 osve-mv NetworkManager: <info> (wlan0): device state change: 9 -> 3 Oct 29 03:05:42 osve-mv NetworkManager: <info> (wlan0): deactivating device. 2) Keyed WEP128 2.1) start Desktop with Montevina wireless card 2.2) launch NetworkManager to get associated with nearby WEP128 keyed AP (Network Mode: Wireless-G only),after sometime it does not get associated with AP. 2.3)log info is as follows from /var/log/messages: Oct 29 01:47:44 osve-mv NetworkManager: <info> Activation (wlan0/wireless): connection 'jane-linksys' has security, and secrets exist. No new secrets needed. Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'ssid' value 'jane-linksys' Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'scan_ssid' value '1' Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'key_mgmt' value 'NONE' Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'auth_alg' value 'OPEN' Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'wep_key0' value '<omitted>' Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: added 'wep_tx_keyidx' value '0' Oct 29 01:47:44 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Oct 29 01:47:44 osve-mv NetworkManager: <info> Config: set interface ap_scan to 1 Oct 29 01:47:44 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 01:47:44 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 01:47:59 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 01:48:09 osve-mv NetworkManager: <info> Activation (wlan0/wireless): association took too long. Oct 29 01:48:09 osve-mv NetworkManager: <info> (wlan0): device state change: 5 -> 6 Oct 29 01:48:09 osve-mv NetworkManager: <info> Activation (wlan0/wireless): asking for new secrets Oct 29 01:48:19 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 01:48:19 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 01:48:34 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 01:48:45 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 01:48:46 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 01:49:01 osve-mv NetworkManager: <info> wlan0: link timed out. 3) WPA Keyed 3.1) start Desktop with Montevina wireless card 3.2) launch NetworkManager to get associated with nearby WPA keyed AP (Network Mode: Wireless-G only),after sometime it does not get associated with AP. 3.3)log info is as follows from /var/log/messages: Oct 29 03:22:04 osve-mv NetworkManager: <info> (wlan0): device state change: 4 -> 5 Oct 29 03:22:04 osve-mv NetworkManager: <info> Activation (wlan0/wireless): connection 'Auto jane-linksys' has security, and secrets exist. No new secrets needed. Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'ssid' value 'jane-linksys' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'scan_ssid' value '1' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'key_mgmt' value 'WPA-PSK' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'psk' value '<omitted>' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'proto' value 'WPA RSN' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'pairwise' value 'TKIP CCMP' Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: added 'group' value 'WEP40 WEP104 TKIP CCMP' Oct 29 03:22:04 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Oct 29 03:22:04 osve-mv NetworkManager: <info> Config: set interface ap_scan to 1 Oct 29 03:22:04 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:22:05 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:22:20 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 03:22:29 osve-mv NetworkManager: <info> Activation (wlan0/wireless): association took too long. Oct 29 03:22:29 osve-mv NetworkManager: <info> (wlan0): device state change: 5 -> 6 Oct 29 03:22:29 osve-mv NetworkManager: <info> Activation (wlan0/wireless): asking for new secrets Oct 29 03:22:39 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:22:39 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:22:43 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:22:43 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) scheduled... Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) started... Oct 29 03:22:49 osve-mv NetworkManager: <info> (wlan0): device state change: 6 -> 4 Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) scheduled... Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) complete. Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) starting... Oct 29 03:22:49 osve-mv NetworkManager: <info> (wlan0): device state change: 4 -> 5 Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0/wireless): connection 'Auto jane-linksys' has security, and secrets exist. No new secrets needed. Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'ssid' value 'jane-linksys' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'scan_ssid' value '1' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'key_mgmt' value 'WPA-PSK' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'psk' value '<omitted>' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'proto' value 'WPA RSN' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'pairwise' value 'TKIP CCMP' Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: added 'group' value 'WEP40 WEP104 TKIP CCMP' Oct 29 03:22:49 osve-mv NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Oct 29 03:22:49 osve-mv NetworkManager: <info> Config: set interface ap_scan to 1 Oct 29 03:22:49 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:22:49 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:23:04 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 03:23:14 osve-mv NetworkManager: <info> Activation (wlan0/wireless): association took too long. Oct 29 03:23:14 osve-mv NetworkManager: <info> (wlan0): device state change: 5 -> 6 Oct 29 03:23:14 osve-mv NetworkManager: <info> Activation (wlan0/wireless): asking for new secrets Oct 29 03:23:24 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:23:24 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Oct 29 03:23:39 osve-mv NetworkManager: <info> wlan0: link timed out. Oct 29 03:23:59 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Oct 29 03:23:59 osve-mv NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 0 Actual results: Fails to get associated with AP Expected results: successfully gets associated with AP Additional info:
"2) We can get Montevina wireless card associated with (WEP64/128,WPA/WPA2) or un-keye APs successfully by commands (such as iwconfig/wpa_supplicant/dhclient),while we soon find that eth0 connection is shutdown after wireless successful association." Based on that, this doesn't immediately seem like a kernel problem to me. Also, have you updated your selinux-policy package?
my feedback is as follows: 1) we have not updated SeLinux 2) Current SeLinux version information is: libselinux-utils-1.33.4-5.1.el5 selinux-policy-2.4.6-162.el5 libselinux-1.33.4-5.1.el5 libselinux-python-1.33.4-5.1.el5 selinux-policy-targeted-2.4.6-162.el5 libselinux-1.33.4-5.1.el5 selinux-policy-devel-2.4.6-162.el5 libselinux-devel-1.33.4-5.1.el5 libselinux-devel-1.33.4-5.1.el5 3) we disabled Default SeLinux Enforcing Mode. may above information do you some help?
The logs indicate that the supplicant continues to scan looking for the AP. Is the AP not broadcasting its SSID?
The AP is broadcating its SSID, in addition, we can also reproduce this defect with hidden AP.
Which exact wifi card are you using, 5100, 5150, 5300, or 5350?
And BTW, you reference KNetworkManager in the bug title... are you really using KNetworkManager, or are you using nm-applet from the NetworkManager-gnome package?
we have run above NetworkManager test both in KDE desktop and GNOME environment, and both have connection failure issue. May this feedback do you some help?
Sorry, It is a typo, I have updated the KNetworkManager into NetworkManager.
(In reply to comment #9) > Which exact wifi card are you using, 5100, 5150, 5300, or 5350? It's 5100.
The logs indicate that the supplicant is simply not finding the AP, it's alternating between 'disconnected' and 'scanning'. Could you do a manual wpa_supplicant connection (with NM disabled) using the "-dddt" options and attach the log output to this bug? You may wish to mark the attachment private. Also, we were able to get NM to easily connect using a 5100 on RHEL 5.3 beta yesterday to a WPA/WPA2 PSK access point, so we know it works in at least some configurations.
we use an wireless linksys AP to go following command line steps: 1) create a file named wpa-config under /root folder with following contents: ctrl_interface_group=0 eapol_version=1 ap_scan=1 network={ ssid="jane-linksys" scan_ssid=1 proto=WPA key_mgmt=WPA-PSK pairwise=TKIP group=TKIP WEP104 WEP40 psk="1234567890" } 2) ifconfig wlan0 up 3) wpa_supplicant -Dwext -i wlan0 -c /root/wpa-config -dddt > /root/wpa-log 2>&1 4) It fails to get associated with AP,and I attach an wpa-log for your reference.
Created attachment 322258 [details] wpa-log log info
From the wpa_supplicant log, it seems there are two problems: 1) The card/driver aren't finding jane-linksys very quickly in the scan results, despite probe-scanning 2) After association, the 4-way handshake doesn't start. The ENOENT error returned from ioctl[SIOCGIWENCODEEXT] is somewhat worrying and could be affecting the 4-way WPA handshake. John, have you been able to associate with TKIP on 5000-series? I'll give it a shot here with a 5350 card and see what I can do too.
we have run command line step for un-keyed and keyed (WEP64/128,WPA2) associated, all goes OK. as to WPA associated, we can readily reproduce Kernal Panic issue which has been reported as a driver issue in Bugzilla, url is https://bugzilla.redhat.com/show_bug.cgi?id=467831. so here we think there maybe something wrong with NetworkManager?
Let me clarify the NetworkManager issue we would like to focus in this bug. NetworkManager 0.7.0 cannot work well with iwlagn driver. During the testing for RHEL5.3-Beta, we found NetworkManager could associate with an AP successfully at the first time system booting up. After that, it always tried to associate with this AP from time to time and failed to associate with all the other APs (no matter those APs are keyed or not). If we stop NetworkManager service and use wpa_supplicant command line to associate with other APs, it will succeed. I think it's a regression of NetworkManager 0.7.0 version. From our side, this issue can be reproduced on both iwl4965 card and iwl5100 card.
Dan/John, I'm trying to apply some internal patches to NetworkManger, but failed to do ./configure on RHEL5.3. I use NetworkManager-0.7.0-0.11.svn4186.el5.src.rpm shipped in source DVD of RHEL5.3-Beta. I installed the rpm and run "rpmbuild -bp NetworkManager.spec". Then I got the source in /usr/src/redhat/BUILD/NetworkManger/. When I do ./configure, I was prompted by the configure script about package version error, such as libnl >= 1.0-pre8, polkit-dbus is needed and much more other errors. I even cannot find the required packages from RHEL5.3-Beta installation ISO to fix those errors to let compilation going on. Anything wrong with my steps of building NetworkManager? Could you please give me some hint? Thanks.
(In reply to comment #18) > Dan/John, > > I'm trying to apply some internal patches to NetworkManger, but failed to do > ./configure on RHEL5.3. > > I use NetworkManager-0.7.0-0.11.svn4186.el5.src.rpm shipped in source DVD of > RHEL5.3-Beta. I installed the rpm and run "rpmbuild -bp NetworkManager.spec". > Then I got the source in /usr/src/redhat/BUILD/NetworkManger/. When I do > ./configure, I was prompted by the configure script about package version > error, such as libnl >= 1.0-pre8, polkit-dbus is needed and much more other > errors. I even cannot find the required packages from RHEL5.3-Beta > installation ISO to fix those errors to let compilation going on. > > Anything wrong with my steps of building NetworkManager? Could you please give > me some hint? Thanks. Try 'rpmbuild -ba --target x86_64 NetworkManager.spec' and install all the dependencies it lists. There will be a lot of -devel RPMs you'll need to install, but once you can 'rpmbuild -ba ...' you'll be good to go. I'd recommend adding the patches to the specfile and just doing the whole thing with rpmbuild.
Thanks, Dan. 'rpmbuild -ba' works for my case. Dan/John, any update for the NetworkManager issue with iwlagn driver? This bug almost blocks users to use Intel wireless card as most of the users won't do command line 'iwconfig' to connect wireless Network instead of NetworkManager.
I did following test at Montevina Platform: 1) At a wireless AP, changing its keyed mode, then try get Montevina associated with it 2) Try to get Montevina associated with different AP (keyed or un-keyed) Part 1) as to switching between different key modes (WEP64/WEP128/WPA/WPA2/unkeyed) on an AP 1) Sometimes (about 30%) we can get Montevina associated with keyed (WPA/WPA2/WEP64/WEP128/unkeyed wireless AP by NetworkManager 2) While at most time (70%) we can fail to get associated with wireless AP by NetworkManager, issue log is as follows(for more details, please go to messages-unpatched-networkmanager): Nov 11 18:23:37 osve-sony NetworkManager: <info> (wlan0): device state change: 4 -> 5 Nov 11 18:23:37 osve-sony NetworkManager: <info> Activation (wlan0/wireless): connection 'Auto jane-linksys' has security, and secrets exist. No new secrets needed. Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'ssid' value 'jane-linksys' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'scan_ssid' value '1' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'key_mgmt' value 'WPA-PSK' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'psk' value '<omitted>' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'proto' value 'WPA RSN' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'pairwise' value 'TKIP CCMP' Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: added 'group' value 'WEP40 WEP104 TKIP CCMP' Nov 11 18:23:37 osve-sony NetworkManager: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Nov 11 18:23:37 osve-sony NetworkManager: <info> Config: set interface ap_scan to 1 Nov 11 18:23:37 osve-sony NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 4 Nov 11 18:23:37 osve-sony NetworkManager: <info> (wlan0): supplicant connection state change: 4 -> 0 Nov 11 18:23:37 osve-sony NetworkManager: <info> (wlan0): supplicant connection state change: 0 -> 2 Nov 11 18:23:38 osve-sony NetworkManager: <info> (wlan0): supplicant connection state change: 2 -> 4 Nov 11 18:23:38 osve-sony NetworkManager: <info> (wlan0): supplicant connection state change: 4 -> 0 Nov 11 18:23:39 osve-sony kernel: printk: 4 messages suppressed. Part 2)As to switching between different wireless APs: 1) When we switch from one keyed AP to another keyed or unkeyed wireless AP, it always fails, then NetworkManager will auto-get associated with a specific AP. So I think NetworkManager provides us an unstable result according to my test on RHEL 5.3 SP1
Created attachment 323299 [details] log message during NetworkManager trying associated with APs
After adapting some patches as follows, current status is as follows: 1) We can get Montevina associated with keyed (WEP64/128,WPA/WPA2) and un-keyed jane-linksys wireless AP now 2)while we still fail to get Montevina associated with some APs by NetworkManager, as it always automatically gets bound to some other APs. Added Patches list in NetworkManager.spec is as follows: Patch17: 012-IFUPDOWN-unmanage-autoconnect-only.patch Patch19: add_probe_for_v250_modems.patch Patch20: 02-dbus_access_network_manager.patch Patch21: 0_NULL_info_linux_driver.patch Patch22: 008-BACKEND-debian-fallback-to-generic-loopback.loom.patch Patch24: 70_lp145653_no_sigaction_for_crashes.patch Patch25: lp191889_always_offline_with_unmanaged_devices.patch May these patches do some help to you.
Created attachment 323749 [details] NetworkManager patches are included in NetworkManager.zip
(In reply to comment #25) > Created an attachment (id=323749) [details] > NetworkManager patches are included in NetworkManager.zip Hmm, most of the patches there wouldn't affect this issue, or are for Debain, not RHEL... Still need more testing with iwl5xxx on our end.
Using a few iwl5000 laptops, we've gotten these results. The test setup includes: Cisco AP, 104-bit WEP DLink AP, WPA2-PSK Results for each laptop follow. All tests were conducted with _wpa_supplicant alone_, with NM disabled to isolate whether the issue is a driver issue or an NM issue. The exact same wpa_supplicant config file was copied to each machine. Fujitsu Lifebook P7230 ----------------------------------- Intel 5350 AGN Echo Peak WiFi/WiMAX [8086:423a] (subsystem [8086:1001]) kernel 2.6.18-122.el5 (i386) wpa_supplicant 0.5.10-6.el5 After making a connection to the WEP AP, connecting to the WPA AP succeeds (CONNECTED seen in supplicant output), but after DHCP receives a NAK when attempting to rebind the previous lease, the iwl5350 disassociates (as seen with 'iwconfig' reporting "not associated" in the Access-Point field, which was the AP's BSSID before DHCP was started). The driver does _NOT_ notify the supplicant that a disassociation event has occurred, and thus the supplicant is not aware of the disassociation. This is clearly a driver bug. Lenovo ThinkPad T500 ------------------------------------ Intel 5100 AGN Shiloh [8086:4237] (subsystem [8086: kernel 2.6.18-120.el5 (x86-64) wpa_supplicant 0.5.10-6.el5 After making a conection to the WPA AP, connections to the WEP AP succeed (CONNECTED seen in the supplicant output) but DHCP fails to obtain an IP address from the DHCP server. This is clearly a driver bug. Periodically kernel panics with flashing capslock light. Sony Vaio VGN-Z540 ------------------------------------ Intel 5100 [8086:4232] (subsystem [8086:1301]) kernel 2.6.18-123.el5 wpa_supplicant 0.5.10-6.el5 Same behavior as the Fujitsu Lifebook; during DHCP NAK the driver drops association but does not notify the supplicant. So, based on this testing, back to kernel... NM has nothing to do with the issues I'm seeing with 3 different iwl5000 laptops, but these issues will definitely affect NM's performance and behavior. John; we need to ensure that everwhere the driver disassociates, it sends the WEXT disassoc event to the supplicant. Might be that this has changed in mac80211 and the supplicant isn't getting the message, but the supplicant is certainly seeing other disassociation events from the driver so I'd suspect the stack/driver for this particular issue. This issue would certainly prevent NM from connecting, since the supplicant doesn't know that the connection has dropped in the driver, and the driver thinks it has... dhclient is known to be somewhat funky about IFF_UP, but given that non-iwl5000 cards don't have issues with dhclient that we can determine, it's likely the driver that's got the badness.
Also, when setting the device down (like exiting wpa_supplicant or 'iwconfig wlan0 down'), the driver hangs the machine with interrupts disabled for about 3 seconds, which isn't great.
Dan, thanks for the analysis. I'm not really sure how a DHCP failure is 'clearly a driver bug', but I'll definitely look into the disassociation bits.
(In reply to comment #31) > Dan, thanks for the analysis. I'm not really sure how a DHCP failure is > 'clearly a driver bug', but I'll definitely look into the disassociation bits. Clearly a driver bug, because it's either mangling the WEP key (or the WPA key) or otherwise unable to communicate with the AP when directed to do so by the supplicant. For the Static WEP case I was using, there's obviously only one path for the key, from supplicnat -> driver. If the driver is unable to set the key correctly, and therefore unable to tx/rx frames with the correct key that the supplicant is sending down, then it seems to me there's an issue with the driver... But the actual issue here, I think, isn't that the WEP key is wrong, but that the driver doesn't think it's associated to the AP, and thus it's just not doing _any_ rx/tx of the DHCP traffic.
John and Dan: I have a suggestion To better look into the iwlagn driver for RH, maybe we need enable "full debugging output in iwlagn driver"? I use kernel-2.6.18-120.el5.src.rpm and enable "full debugging output in iwlagn driver", while I fail to "make" kernel as it outputs: In file included from drivers/net/wireless/iwlwifi/iwl-agn.c:52: drivers/net/wireless/iwlwifi/iwl-io.h: In function ‘__iwl_write32’: drivers/net/wireless/iwlwifi/iwl-io.h:70: error: ‘struct wiphy’ has no member named ‘dev’ drivers/net/wireless/iwlwifi/iwl-io.h:70: error: ‘struct wiphy’ has no member named ‘dev’ drivers/net/wireless/iwlwifi/iwl-io.h: In function ‘__iwl_read32’: drivers/net/wireless/iwlwifi/iwl-io.h:83: error: ‘struct wiphy’ has no member named ‘dev’ drivers/net/wireless/iwlwifi/iwl-io.h:83: error: ‘struct wiphy’ has no member named ‘dev’ drivers/net/wireless/iwlwifi/iwl-io.h: In function ‘__iwl_poll_bit’: drivers/net/wireless/iwlwifi/iwl-io.h:111: error: ‘struct wiphy’ has no member named ‘dev’ drivers/net/wireless/iwlwifi/iwl-io.h:111: error: ‘struct wiphy’ has no member named ‘dev’ .................... drivers/net/wireless/iwlwifi/iwl-agn.c:4309: error: ‘struct wiphy’ has no member named ‘dev’ make[4]: *** [drivers/net/wireless/iwlwifi/iwl-agn.o] Error 1 make[3]: *** [drivers/net/wireless/iwlwifi] Error 2 make[2]: *** [drivers/net/wireless] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 [root@osve-mv linux-2.6.18.x86_64] # will you please help me?
Test kernels w/ a backported patch to for disassociation before changing BSSes are available here: http://people.redhat.com/linville/kernels/rhel5 Please give them a try and post the results here...thanks! P.S. They also have a patch to enable the use of the debug option mentioned in comment 33 (if you build your own kernel and change the configuration appropriately).
Updates: Issue 1) per my test using NetworkManager (which is not patched), it seems no obvious improvements on 2.6.18-124.el5.jwltest.69 as it fails most time. Issue 2) we can get Montevina associated with a wireless WEP64 keyed AP successfully by command line method: 1) ifconfig wlan0 up 2) iwconfig wlan0 mode managed key 1234567890 3) iwconfig wlan0 essid "jane-linksys" 4) dhclient wlan0 5 ping 192.168.50.6 Issue 3) while we fail to get IP address from a WPA keyed AP,but we can see it really associted with AP by iwconfig wlan0 IEEE 802.11 ESSID:"jane-linksys" Mode:Managed Frequency:2.437 GHz Access Point: 00:18:39:79:D0:72 Bit Rate=60 Mb/s Tx-Power=15 dBm Retry min limit:7 RTS thr:off Fragment thr=2352 B Encryption key:7935-A6E5-7194-B509-0F7A-154A-8B14-63F4-5907-65DC-20E9-CDD5-4ADC-B749-8C77-FFC1 [2] Link Quality=62/100 Signal level=-58 dBm Noise level=-83 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 So here we think Montevina has setup link with AP at MAC layer but fail at IP layer. steps are as follows: 1) create a file named wpa-config under /root folder with following contents: ctrl_interface_group=0 eapol_version=1 ap_scan=1 network={ ssid="jane-linksys" scan_ssid=1 proto=WPA key_mgmt=WPA-PSK pairwise=TKIP group=TKIP WEP104 WEP40 psk="1234567890" } 2) ifconfig wlan0 up 3) wpa_supplicant -Dwext -i wlan0 -c /root/wpa-config Issue 4) after building kernel-2.6.18-124.el5.jwltest.69.src.rpm with "full debugging output in iwlagn driver" enabled at Sony laptop, we reinsert the iwlagn module by "modprobe -r iwlagn; modprobe iwlagn debug=0x8240b", but we can not see any iwlagn log message from /var/log/messages or /var/log/dmesg, can you help me? thanks your nice help!!
Created attachment 324074 [details] /var/log/messages The -124.el5.jwltest.69 kernel doesn't seem to fix the problem on an X200 with an iwl5100. This log contains several attempts to connect to WPA and WEP APs using that kernel and the -123.el5 kernel.
Clearing needinfo since the information was provided in comment #29
Error still occurs with test kernel .69... First association happens successfully, and the last lines in the logs are: wlan0: association wlan0: switched to short barker preamble (BSSID=00:1e:f7:xx:xx:xx) wlan0: association frame received from 00:1e:f7:xx:xx:xx, but not in associated state - ignored <start dhclient here> ACPI: PCI interrupt for device 0000:03:00.0 disabled PCI: Enabling device 0000:03:00.0 (0000 -> 0002) ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17 (level, low) -> IRQ 66 PM: Writing back config space on device 0000:03:00.0 at offset 1 (was 100002, writing 100006) ADDRCONF(NETDEV_UP): wlan0: link is not ready The bits after dhclient starts indicate that the dev->close() got called on the driver somewhere, but auditing dhclient code, it only calls SIOCGIFFLAGS and doesn't appear to set them. No disassociation event is sent via WEXT as a result of the device being closed, thus the supplicant doesn't know that association has been lost. At this point, the driver reports "not assocated" in iwconfig output.
wpa_supplicant does use SIOCSIFFLAGS, but only as at the start of association if to comply with mac80211's requirement that the device be down when a mode switch occurs. It shouldn't be called at all when in COMPLETED state.
(In reply to comment #38) > The bits after dhclient starts indicate that the dev->close() got called on the > driver somewhere, but auditing dhclient code, it only calls SIOCGIFFLAGS and > doesn't appear to set them. It may actually not imply dev->close(), since that seems to usually hang the machine for 3 - 4 seconds (just 'ifconfig wlan0 down' will hang a bit and all ifconfig down does is set !IFF_UP on the device, which calls dev->close()). Need more investigation in the driver...
OK, I just booted that jwltest.69 kernel and connected vi NetworkManager to my WPA-PSK AP immediately with no apparent problems. Is this supposed to be difficult to recreate?
Try switching back and forth between, for example, your WPA AP and a WEP AP.
WEP AP...
WPA AP...
WPA2 AP...
Back to WPA AP...
This box is a CentOS...errr...RHEL 5.2 userland with the jwltest.69 kernel. I'm wondering if we have a situation where the new kernels work with the older userland and perhaps the old kernels work with the new userland but the new kernels don't work with the new userland??? (FWIW, I did just get a seemingly random disconnect while typing the above. But, NM reconnected immediately...)
Replicated wpa_supplicant config from comment 35, connected immediately and dhclient returned w/ IP address immediately. What am I missing?
Hmmm...this is an iwl4965 device. Does this only occur on iwl5000?
I've seen it on iwl4965 too. Personally I'm using x86_64 5.3 installs, whereas you're using CentOS 5.2... what arch are you using?
[root@linville-m4300 ~]# uname -a Linux linville-m4300.local 2.6.18-124.el5.jwltest.69 #1 SMP Tue Nov 18 14:14:41 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
(In reply to comment #48) > Replicated wpa_supplicant config from comment 35, connected immediately and > dhclient returned w/ IP address immediately. What am I missing? Without NetworkManager runnning, I can reproduce this IP failure issue readily at Sony+Montevina+RHEL5.3 SP2+124/122-kernel Platform, while I can get an IP successfully when chaning the same AP to WEP-64 keyed mode, details is as follows: Detail 1) [root@osve-sony ~]# wpa_supplicant -D wext -i wlan0 -c /root/wpa-config ioctl[SIOCSIWAUTH]: Operation not supported WEXT auth param 4 value 0x0 - Trying to associate with 00:18:39:79:d0:72 (SSID='jane-linksys' freq=2437 MHz) Associated with 00:18:39:79:d0:72 WPA: Key negotiation completed with 00:18:39:79:d0:72 [PTK=TKIP GTK=TKIP] CTRL-EVENT-CONNECTED - Connection to 00:18:39:79:d0:72 completed (auth) [id=0 id_str=] Detail 2) from dmesg, we can get: ADDRCONF(NETDEV_UP): eth0: link is not ready eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX eth0: 10/100 speed: disabling TSO ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready PCI: Enabling device 0000:06:00.0 (0000 -> 0002) ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 17 (level, low) -> IRQ 177 PM: Writing back config space on device 0000:06:00.0 at offset 1 (was 100002, writing 100006) ADDRCONF(NETDEV_UP): wlan0: link is not ready wlan0: Initial auth_alg=0 wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: RX authentication from 00:1d:70:93:91:70 (alg=0 transaction=2 status=0) wlan0: authenticated wlan0: associate with AP 00:1d:70:93:91:70 wlan0: RX AssocResp from 00:1d:70:93:91:70 (capab=0x421 status=0 aid=1) wlan0: associated wlan0: switched to short barker preamble (BSSID=00:1d:70:93:91:70) ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready wlan0: RX deauthentication from 00:1d:70:93:91:70 (reason=1) wlan0: deauthenticated wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: RX authentication from 00:1d:70:93:91:70 (alg=0 transaction=2 status=0) wlan0: authenticated wlan0: associate with AP 00:1d:70:93:91:70 wlan0: RX ReassocResp from 00:1d:70:93:91:70 (capab=0x421 status=0 aid=1) wlan0: associated wlan0: switched to short barker preamble (BSSID=00:1d:70:93:91:70) wlan0: RX deauthentication from 00:1d:70:93:91:70 (reason=1) wlan0: deauthenticated wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: authentication with AP 00:1d:70:93:91:70 timed out eth0: no IPv6 routers present wlan0: no IPv6 routers present (Notes:we can get it associated with WEP64 AP and get IP address) Bluetooth: L2CAP ver 2.8 Bluetooth: L2CAP socket layer initialized Bluetooth: RFCOMM socket layer initialized Bluetooth: RFCOMM TTY layer initialized Bluetooth: RFCOMM ver 1.8 Bluetooth: HIDP (Human Interface Emulation) ver 1.1 wlan0: Initial auth_alg=0 wlan0: authenticate with AP 00:1d:70:93:91:70 wlan0: Initial auth_alg=0 wlan0: authenticate with AP 00:18:39:79:d0:72 wlan0: RX authentication from 00:18:39:79:d0:72 (alg=0 transaction=2 status=0) wlan0: authenticated wlan0: associate with AP 00:18:39:79:d0:72 wlan0: RX AssocResp from 00:18:39:79:d0:72 (capab=0x411 status=0 aid=1) wlan0: associated wlan0: switched to short barker preamble (BSSID=00:18:39:79:d0:72) wlan0 (WE) : Wireless Event too big (342)(question:I often see this message at console, what is this for?) Detail 3) after above wpa_supplicant step, I find wlan0 interface is down. all above seems strange to us, so to get deepper to root cause, maybe we need iwlagn debug info now? As I said in comments 35, I can compile a debug kernel, but I cannot find debug info from /var/log/messsages or /var/log/dmesg or /sys/class/net/wlan0? can you help me? thanks a lot. in addition, I get an new issue at Desktop PC with Montevina card, see as follows: 1)initially after OS boots, no wlan0 interface is alive, see as follows: [root@osve-mv ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1B:21:12:28:29 inet addr:10.239.48.114 Bcast:10.239.48.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe12:2829/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:456 errors:0 dropped:0 overruns:0 frame:0 TX packets:221 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:53925 (52.6 KiB) TX bytes:27197 (26.5 KiB) Memory:51940000-51960000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:896 (896.0 b) TX bytes:896 (896.0 b) 2) I bring wlan0 up: [root@osve-mv ~]# ifconfig wlan0 up [root@osve-mv ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1B:21:12:28:29 inet addr:10.239.48.114 Bcast:10.239.48.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe12:2829/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:491 errors:0 dropped:0 overruns:0 frame:0 TX packets:243 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:57107 (55.7 KiB) TX bytes:30809 (30.0 KiB) Memory:51940000-51960000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:896 (896.0 b) TX bytes:896 (896.0 b) wlan0 Link encap:Ethernet HWaddr 00:16:EA:01:9D:5A UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:491 errors:0 dropped:0 overruns:0 frame:0 TX packets:245 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:57107 (55.7 KiB) TX bytes:31133 (30.4 KiB) wmaster0 Link encap:UNSPEC HWaddr 00-16-EA-01-9D-5A-98-64-00-00-00-00-00-00-00-00 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:492 errors:0 dropped:0 overruns:0 frame:0 TX packets:245 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:57167 (55.8 KiB) TX bytes:31133 (30.4 KiB) 3) then try to associate it with a WPA-keyed AP: [root@osve-mv ~]# wpa_supplicant -Dwext -i wlan0 -c /root/wpa-config ioctl[SIOCSIWAUTH]: Operation not supported WEXT auth param 4 value 0x0 - CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys ioctl[SIOCSIWENCODEEXT]: No such file or directory CTRL-EVENT-TERMINATING - signal 2 received ioctl[SIOCSIWAUTH]: Operation not supported WEXT auth param 4 value 0x0 - [root@osve-mv ~]# 4)it seems wlan0 is down by above wpa_supplicant command [root@osve-mv ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1B:21:12:28:29 inet addr:10.239.48.114 Bcast:10.239.48.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe12:2829/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:557 errors:0 dropped:0 overruns:0 frame:0 TX packets:283 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:62409 (60.9 KiB) TX bytes:38049 (37.1 KiB) Memory:51940000-51960000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:896 (896.0 b) TX bytes:896 (896.0 b) 5) while we do have FW installed: [root@osve-mv ~]# ll /lib/firmware/ total 344 -rw-r--r-- 1 root root 345008 Jun 3 00:37 iwlwifi-5000-1.ucode
(In reply to comment #48) > Replicated wpa_supplicant config from comment 35, connected immediately and > dhclient returned w/ IP address immediately. What am I missing? My issues (interface is disassociated) occur only when associating with a second AP. I have two networks in my supplicant config (one with disabled=1 and other with disabled=0), one static WEP, the other WPA2-PSK. I can usually connect to the first network (doesn't matter which one) just fine, but then I Ctl+C the supplicant, swap the disabled=X items, and start the supplicant again, wait for a connection, then 'dhclient -1 -d wlan0'. The 'interface is disassociated' issue usually occurs on the second supplicant run, sometimes the 3rd, but always after trying a second AP.
Just a quick update...I am able to recreate the issue on a ipw5100-equipped lenovo t400 using Dan's comments. No idea what is actually happening at the moment...
Just to make things wierder...if I configure a static IP address (i.e. with ifconfig or ip) then the connection seems to stay up and work fine. So, one has to wonder about two things: a) why does the DHCP server NAK the request? and, b) what happens when the NAK is received that causes the disassociation. Also, why does wpa_supplicant not hear about the association even after my patch from comment 34? Lots to wonder... :-(
FWIW, after the association and static IP configuration I executed 'ifconfig wlan0 down; ifconfig wlan0 up' and wpa_supplicant reacted similarly to how it acted after the DHCPNAK was received. Without auditing the dhclient code, I'm guessing that it does the down/up after receiving a NAK. Of course, I still don't know why it is getting the DHCPNAK...
Often, you'll be connected to network A and dhclient will get a least from DHCP server A. When you connect to network B, dhclient tries to use that same lease, which of course is rejected by DHCP server B, because the IP range may be different, or whatever. Thus the NAK. But dhclient will recover just fine from the NAK, and then go back to doing DISCOVER and getting a valid address from DHCP server B. Except that it can't, becuase the driver has lost association and isn't doing any tx of the packets. If dhclient _is_ bouncing the interface, that's simply wrong, and dhclient needs to be fixed. However, the driver should certainly be emitting the SIOCGIWAP(null BSSID) event when it looses association. The long-range fix for the NAK is for NetworkManager to name leasefiles according to the wifi network being associated with, so that dhclient will use the same lease file for the same network, and not get them mixed up. This helps association speed too, becuase dhclient isn't spending time in REQUEST for a lease that can't possibly work.
Good info, thanks! FWIW, using that info I seem to be able to recreate it on the 4965 box as well.
Alright, still experimenting to get a handle on how things are supposed to be working... I recreated this scenario on my ipw2200-equipped box, including the DHCPNAK and the subsequent down/up. wpa_supplicant's behavior was identical (including the "l2_packet_receive - recvfrom: Network is down") and dhclient seemed to be essentially the same (including the "send_packet: Network is down") except that dhclient succeeded in eventually getting a lease. What is interesting is that the iwevent output also looked the same -- no "New Access Point/Cell address:Not-Associated" message at all. So I'm not entirely sure that the missing SIOCGIWAP is really the problem. FWIW, I don't see anything in the ifdown patch for mac80211 that differs significantly from upstream either. I'll have to check if the ipw2200 tries to maintain an association across a down/up cycle...?
For the records, I just recreated this on a late wireless-testing kernel using the rtl8187 driver...
Created attachment 324443 [details] mac80211-deauth-on-down.patch Taking Dan's advice, I devised this patch to deauth when taking the device down (which also sends the SIOCGIWAP event). This seems to have resolved the issue for me, at least when using the manual wpa_supplicant configurations.
Test kernels with the above patch are available here: http://people.redhat.com/linville/kernels/rhel5 Please give them a try -- do they improve the situation for you?
this version seems a great improvement in comparsion with past versions. On RHEL OS boots up, it always tries to get associated with a un-keyed wireless AP, things can be described in 2 threads after applying this kernel (kernel-2.6.18-24.el5.jwltest.70.i686.rpm) at Montevina Sony Laptop: 1) if it gets associated with a unkeyed AP successfully, we can get it associated with WAP/WPA2/WEP64/WEP128 keyed or unkeyed AP successfully. 2) if it fails to get associated with a unkeyed AP, user will fail to see any nearby APs in NetworkManager, this maybe not accepeable by finial user. although user can force it get associated with some APs in NetworkManager. 3) I find it always fails when switching between 2 keyed APs.
(In reply to comment #61) > Created an attachment (id=324443) [details] > mac80211-deauth-on-down.patch > > Taking Dan's advice, I devised this patch to deauth when taking the device down > (which also sends the SIOCGIWAP event). This seems to have resolved the issue > for me, at least when using the manual wpa_supplicant configurations. Thinking about it over the weekend, the missing IWAP event may not be caused by a down but by something else in the stack that fails to send the event on disassociation. Here's why... Whenever the iwl5000 goes "down", the box hangs for about 3 or 4 seconds with interrupts disabled, then comes back to life. This happens with a simple "ifconfig wlan0 up" -> "ifconfig wlan0 down" cycle without NM or the supplicant involved. So it seems that when taking the interface down, the driver hangs. However, when seeing this disassociation during DHCP, the box does _not_ hang, yet 'iwconfig' reports state as unassociated. This leads me to believe that something in the driver or the stack is transitioning to unassociated state but not sending out the IWAP event. But if something was really taking the device "down", I'd expect the same 3 - 4 second interrupts-disabled hang that I'd get from 'iwconfig wlan0 down'. But I don't see that. So that leads me to believe it's something else. I think we basically need to instrument the stack with BUG_ON or WARN_ON (to get a backtrace in the logs) everwhere that the stack or driver exit ASSOCIATED status. That should allow us to see the exact functions that might be causing the silent disassociation. However, the debug options for the driver don't cover that AFAIK, even 0xffffffff.
If it isn't the down, the why does my patch cause the IWAP events to show-up? I've already looked at the code for where mac80211 exits associated state -- AFAICT there is no path that won't send the event. The most likely thing is that firmware in the iwlagn device is a little bit too smart and is tracking association state with a state machine that doesn't quite match mac80211. The patch above remedies that (and is probably the right thing to do anyway). Regarding the delay/hang/whatever -- I don't know what is causing it, and the Intel team has shown little interest in it. FWIW, I see it with the jwltest.70 kernels after the DHCPNAK. Regardless, I cannot recreate the problem from comment 58 (or thereabouts) with the patch applied. From my perspective, that puts us back to a situation where wpa_supplicant/dhclient works and NM apparently does not.
Ok, I'll grab and install .70 and do more testing. Thanks!
What role might the hang/delay/whatever play in confusing NetworkManager about the status of the device?
(In reply to comment #67) > What role might the hang/delay/whatever play in confusing NetworkManager about > the status of the device? It shouldn't play any role, but it's obviously bad behavior... Anyway, I can confirm that your patch "fixes" the DHCP diassociation problem, though it's still a mystery what causes the !IFF_UP in the first place, which should be the location of the real fix. Since we can't guarantee that the config will be preserved across up/down for wireless devices (by design really) stuff should be be bouncing the device. I've heard DHCP is to blame for this sort of thing in the past (from the WiMAX guys like Marcel) and thus that was my first guess... But the next issue is a supplicant bug, because it stops scanning for the new AP when told to associate with the new AP, but the iwl5000 driver keeps sending assoc/disassoc events for the old AP, which obviously is no longer in the supplicant config becuase the user doesn't want to connect to it any more. So back to me. Thanks! You might hear from me again :)
So...should I be submitting those patches for the RHEL5 kernel?
Digging in a little deeper to the supplicant issue, it seems that the driver is trying to connect too aggressively to the old AP, and that it doesn't honor explicit scan requests which allow wpa_supplicant to find the AP it wants to associate with. When the second network is chosen from the menu, the supplicant tells the driver to disassociate, but the driver simply keeps retrying the association to the original AP. Of course, since the supplicant doesn't want to connect to the original AP, it tells the driver to disassociate from that AP, but the driver simply tries to associate again, repeat ad-nauseum. And the scan results never come through.
Just curious...does it change the behavior if you use the "disable_hw_scan=1" option for iwlagn?
Request to associate to second AP starts at 1227552479.751186. Scan requested at 1227552479.840644. Driver reconnects to old AP at 1227552479.957806. Supplicant requests disconnection from the erroneous association at 1227552479.958285. <repeat forever> etc... I need to verify when the supplicant terminates the scan requests, which it might do if it got an association event from the driver and moves from SCANNING -> ASSOCIATED. But the root cause of this problem is that the driver keeps trying to reconnect to the original AP (probably because with WEXT there isn't a way to say "stop until I tell you what to do"), and secondly that the driver doesn't honor scan requests, probably because it's too busy associating. I patched the supplicant just now to set the SSID to 32 bytes of garbage when telling the driver to disassoc, and that works somewhat better, but the driver still isn't honoring scan requests from the supplicant. A well-timed 'iwlist wlan0 scan' right after the supplicant tries usually makes things happier. Trying to rebuild iwlagn and figure out where things are refusing to scan.
(In reply to comment #72) > Just curious...does it change the behavior if you use the "disable_hw_scan=1" > option for iwlagn? Yeah, I thought of the that too before doing the supplicant garbage-SSID-on-disassoc thing, and it didn't help much. Will retry now. Nah, doesn't work any better.
John: iwl_mac_hw_scan scan rejected: within the next scan period seems to be the culprit; wpa_supplicant expects that the scan either returns an error, or that if it supports the SIOCGIWSCAN event, that it will eventually return that event from a scan request. I think it's an error on the part of the driver to not either reject the scan request with -EAGAIN, or to just send a scan event on failure. What it should never do is silently reject the scan and let the supplicant sit around and wait. Commenting out the two blocks in that function that abort the scan request if it came in too early make things work quite a bit more happily with NetworkManager and wpa_supplicant. We shouldn't just remove those two, but I should narrow down the conditions that are in-force here (ie, ensure that iwl_is_associated() == FALSE before we comment out that block, which it should be, because this only happens when we are disconnected). If either of those two conditions still stand after that analysis, then we must ensure that when the scan is rejected, that -EAGAIN is returned to userspace, or a SIOCGIWSCAN event is emitted to wake up the supplicant.
So the reason that EAGAIN isn't getting returned to the supplicant is that ieee80211_sta_req_scan() does the scan from a workqueue and immediately returns 0 (ie success) from the SIOCSIWSCAN handler. Thus, when the iwlwifi code decides it can't scan because it already scanned too recently, the EAGAIN is completely lost because the SIOCSIWSCAN handler has already returned success.
So there's a few problems in total (including ones patched in jwltest.70): 1) iwlagn thinks it can decide when scans happen and when they don't; I think that policy should be in the stack, not the drivers. There was probably a reason it was added to iwlagn in the first place, maybe to ensure that the MAC wasn't DoS-ed with scan requests, but it still seems wrong to have that logic there unless the hardware simply cannot handle back-to-back scan requests. 2) mac80211 schedules scan requests from a workqueue, thereby losing any errors that may result from requesting a scan 3) wpa_supplicant trusts drivers (too much) to reliably emit a scan results event for every scan request 4) mac80211 doesn't notify userspace that a disassociation has occurred when the device is closed/goes down 5) wpa_supplicant doesn't adequately handle disassociations during association by re-scheduling a scan (which is bug #468441 which we fixed only for ipw3945 which triggered the same behavior, but for different reasons)
6) The supplicant isn't adequately telling the driver to disconnect, it is simply sending the SIOCSIWMLME command for disassociation, but that behavior is ambiguous with WEXT, and the driver is free to continue trying to associate with the SSID or BSSID that was previously locked, which is what happens to me (reassociation to the old AP while attempting to scan for the new one). Talking over this with Jouni, he thinks the supplicant should be setting a garbage SSID and clearing the BSSID, thus forcing the driver to stop trying to connect to the old AP. In the future, for cfg80211, we really need a "stop connecting" command.
Oddly enough, disable_hw_scan=1 didn't seem to make too much of a difference, I'll have to investigate that more tomorrow and see what its doing. It would still be scheduled from the work queue though, and thus still be subject to the loss of error information if for some reason the iwl hardware still didn't want to scan at that point.
It seems like there are a lot of opportunities for the scan request to bail-out in iwlagn. I don't know how we are going to fix this properly...
Created attachment 324665 [details] jwltest-iwlwifi-scan-failed.patch Attempt to notify userland when iwlwifi fails to initiate scan.
John; why not: diff -up ./mlme.c.scan ./mlme.c --- ./mlme.c.scan 2008-11-25 17:51:32.000000000 -0500 +++ ./mlme.c 2008-11-25 17:53:23.000000000 -0500 @@ -3317,10 +3317,16 @@ void ieee80211_sta_work(void *ptr) if (ifsta->state != IEEE80211_AUTHENTICATE && ifsta->state != IEEE80211_ASSOCIATE && test_and_clear_bit(IEEE80211_STA_REQ_SCAN, &ifsta->request)) { + int rc; + if (ifsta->scan_ssid_len) - ieee80211_sta_start_scan(dev, ifsta->scan_ssid, ifsta->scan_ssid_len); + rc = ieee80211_sta_start_scan(dev, ifsta->scan_ssid, ifsta->scan_ssid_len); else - ieee80211_sta_start_scan(dev, NULL, 0); + rc = ieee80211_sta_start_scan(dev, NULL, 0); + + if (rc) + ieee80211_scan_completed(local_to_hw(local)); + return; } We'll have the same issues, though not as often, with plain mac80211 drivers. In the long run I'd like to add data to SIOCGIWSCAN events that indicate a reason code. All apps out there now will ignore this, but if it's passed the supplicant could use the code to figure out what the error was.
obviously this doesn't cover errors during the scan, but it may be a much less invasive change for this late in the cycle. we can revisit this upstream after 5.3 and perhaps get a better fix upstream that can be backported later if we run into some other manifestation of this problem.
Test kernels w/ the patch from comment 81 are available here: http://people.redhat.com/linville/kernels/rhel5 Comment 82 seems reasonable, I'll look into it more in the morning...FWIW, iwlwifi uses its own work queues....
Can Intel please test the patch in COmmnet #81?
Dont' test the patch in comment #84 (ie, mine) since it doesn't actually work. If you get a chance to test the comment in patch #81 though, that would be good. John: that one was compile tested only, and panics the box actually; I guess it's not quite safe to call ieee80211_complete_scan() at that point. I'll figure it out over the holiday and come up with something.
Crud, I rebuilt and published jwltest.72 w/ Dan's patch... If you managed to get jwltest.71, please test that.
Per test to kernel-2.6.18-124.el5.jwltest.72.i686.rpm at Montevina Sony Laptop, updates is as follows: 1) it can switch between keyed APs now, sometimes it seems stable (such as switch from A to B OK, while sometimes fails to switch from B to A) 2) it still tries to get assciated with wireless AP on OS boot up, so it will take somewhat longer time for OS booting up. 3) by about 30% probability, after OS boots up, we can get wlan0 by issuing ifconfig, but we can not see any NEARBY APs in NetworkManager, and user even fails to find a known AP by inputting its essid in NetworkManager.
sorry a typo: 1) it can switch between keyed APs now, sometimes it seems unstable (such as switch from A to B OK, while sometimes fails to switch from B to A)
Hmmm...so jwltest.72 doesn't have problems like Dan mentions in comment 86? Dan, is there an updated wpa_supplicant that Dongpo should be using?
(In reply to comment #90) > Hmmm...so jwltest.72 doesn't have problems like Dan mentions in comment 86? > > Dan, is there an updated wpa_supplicant that Dongpo should be using? Yeah, there's wpa_supplicant updates that are needed too.
Dan, any further info on this one? Did you work-out the issue you mentioned in comment 86?
Dan, do you need yp update bug #455789? Any updates on Comment #86?
jwltest.72 tests out fine for me. Let's go with those patches.
I just spent some time testing this on a Thinkpad X200 with an iwl5x00. jwltest.72 and wpa_supplicant-0.5.10-8.el5. What I'm doing for testing is switching between a WEP and a WPA (not sure if it is using v1 or v2) AP using NM, meanwhile in the background I'm pinging each AP constantly. On a clean boot this appears to work fine - I switched back and forth several times, and traffic was passing properly. Then, while I was connected to the WEP AP I suspended the laptop to RAM. After resume, I was unable to connect back to the WEP AP. It would associate, but DHCP would time out. I was, however, able to connect to the WPA AP manually after trying WEP. Still unable to connect to the WEP AP after resume, I suspended to RAM again. After resume, I was automatically connected to the WPA AP. So from here, it looks like the only thing we need to fix is whatever is going on with WEP.
Updated supplicant builds are here: http://people.redhat.com/dcbw/wpa_supplicant/RHEL-5/ please test them in combination with the jwltest.73 kernel: http://people.redhat.com/linville/kernels/rhel5/ and see how things work.
(In reply to comment #88) > Per test to kernel-2.6.18-124.el5.jwltest.72.i686.rpm at Montevina Sony Laptop, > updates is as follows: > 1) it can switch between keyed APs now, sometimes it seems stable (such as > switch from A to B OK, while sometimes fails to switch from B to A) some considerable improvement in comparion with kernel-2.6.18-124.el5.jwltest.72.i686.rpm, fewer failure when doing switches between AP A and AP B. > 2) it still tries to get assciated with wireless AP on OS boot up, so it will > take somewhat longer time for OS booting up. yes, this issue can be reproduced at this kernel version. > 3) by about 30% probability, after OS boots up, we can get wlan0 by issuing > ifconfig, but we can not see any NEARBY APs in NetworkManager, and user even > fails to find a known AP by inputting its essid in NetworkManager. yes, at this kernel version, this issue seems be fixed. Updates is according to kernel-2.6.18-125.el5.jwltest.73.i686.rpm at Montevina Sony Laptop. I think this version is acceptable. In addition, I fail to reboot Sony Laptop by issuing "reboot" command, while this issue can not be reproduced at RHEL 5.3 SP4.
in addtion, I notice an conflict between /etc/resolv.conf from eth0 and wlan0 on OS bootup. As eth0 and wlan0 will try to get IP address during OS boots up, if each gets its IP, then there will be 2 copies of /etc/resolv.conf, the last one will totally replace the first one.
I've reproduced the WEP/WPA issue described in comment #96 without NetworkManager, and I can trigger it on iwl5350 just by using wpa_cli to switch between networks without downing the device or stopping wpa_supplicant.
Just triggered the WEP failure again on the same machine without suspending. What I did was use NM to connect to the WEP AP, then reconnect to it simply by choosing it from the NM menu. After 4 reconnects, WEP simply stopped working.
in kernel-2.6.18-126.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
after installing kernel-2.6.18-126.el5.x86_64.rpm into Montevina Sony Laptop with RHEL 5.3 SP5 installed is as follows: 1) it tries to get associated with a wireless AP on OS boot up, this will take longer time to get OS boot-up. 2) On OS bootup, sometimes we can find wlan0 interface by issuing "ifconfig", but we can not find any wirelss AP(s) in NetworkManager 3) On OS bootup, sometimes we can find nearby wirelss AP(s) in NetworkManager and get associated some WPA/WEP keyed AP(s), but we always fail to switch between keyed APs (such as WEP64<->WPA) 4) On OS bootup, we can find nearby wirelss AP(s) in NetworkManager and get associated some WPA/WEP keyed AP(s), then we get OS suspend and wake up, then sometimes we fail to get wireless card associated with keyed wireless AP(s). 5) One good news: we can get OS restart by issuing "reboot" command at this kernel version. As you can see from above status, I would perfer kernel-2.6.18-125.el5.jwltest.73.i686.rpm rather than this kernel-2.6.18-126.el5.x86_64.rpm. John: Which Snapshot will be intergrated with the patches as in kernel-2.6.18-125.el5.jwltest.73.i686.rpm ?
As I understand it, -126.el5 should have all the patches from -125.el5.jwltest.73...
(In reply to comment #108) > after installing kernel-2.6.18-126.el5.x86_64.rpm into Montevina Sony Laptop > with RHEL 5.3 SP5 installed is as follows: > 1) it tries to get associated with a wireless AP on OS boot up, this will take > longer time to get OS boot-up. /sbin/chkconfig network off this is the Network service trying to bring up the connection, which is redundant since you're using NetworkManager. Alternatively, you may be hitting bug #472441. > 2) On OS bootup, sometimes we can find wlan0 interface by issuing "ifconfig", > but we can not find any wirelss AP(s) in NetworkManager The first scan or two may have failed becuase the Intel WiFi firmware and hardware have some odd assumptions about when scans can happen and when they can't. Wait a bit and the APs should show up in the menu. > 3) On OS bootup, sometimes we can find nearby wirelss AP(s) in NetworkManager > and get associated some WPA/WEP keyed AP(s), but we always fail to switch > between keyed APs (such as WEP64<->WPA) There is a bug in the Intel driver's use of hardware encryption acceleration. Please use "swcrypto50=1" or "swcrypto=1" when loading the 'iwlagn' module as a workaround for now, by adding those options to /etc/modprobe.conf like so: alias wlan0 iwlagn swcrypto50=1 swcrypto=1 > 4) On OS bootup, we can find nearby wirelss AP(s) in NetworkManager and get > associated some WPA/WEP keyed AP(s), then we get OS suspend and wake up, then > sometimes we fail to get wireless card associated with keyed wireless AP(s). This may also be due to the hardware encrpytion acceleration issues. Please try these module parameters and see if you can reproduce the issue reliably. Also, please be sure you have installed the wpa_supplicant-0.5.10-8.el5 (bug #468441) update as well from RHEL 5.3 snapshot 5, as that is necessary to ensure reliable connections.
(In reply to comment #110) > (In reply to comment #108) > > after installing kernel-2.6.18-126.el5.x86_64.rpm into Montevina Sony Laptop > > with RHEL 5.3 SP5 installed is as follows: > > 1) it tries to get associated with a wireless AP on OS boot up, this will take > > longer time to get OS boot-up. > /sbin/chkconfig network off > this is the Network service trying to bring up the connection, which is > redundant since you're using NetworkManager. Alternatively, you may be > hitting bug #472441. If we run "/sbin/chkconfig network off", the eth(x) interface is not enable on OS bootup. it seems inconvenient to our finial user? So I prefer bug #472441 suggestion: modify "ONBOOT=yes" into "ONBOOT=no" in /etc/sysconfig/network-scripts/ifcfg-wlan0? > > 2) On OS bootup, sometimes we can find wlan0 interface by issuing "ifconfig", > > but we can not find any wirelss AP(s) in NetworkManager > The first scan or two may have failed becuase the Intel WiFi firmware and > hardware have some odd assumptions about when scans can happen and when they > can't. Wait a bit and the APs should show up in the menu. > > 3) On OS bootup, sometimes we can find nearby wirelss AP(s) in NetworkManager > > and get associated some WPA/WEP keyed AP(s), but we always fail to switch > > between keyed APs (such as WEP64<->WPA) > There is a bug in the Intel driver's use of hardware encryption acceleration. > Please use "swcrypto50=1" or "swcrypto=1" when loading the 'iwlagn' module as a > workaround for now, by adding those options to /etc/modprobe.conf like so: > alias wlan0 iwlagn swcrypto50=1 swcrypto=1 I applied "alias wlan0 iwlagn swcrypto50=1 swcrypto=1", while the AP switch failure issue can still be reproduced(such as fail to swith from WPA keyed mode to WEP64 keyed mode). > > 4) On OS bootup, we can find nearby wirelss AP(s) in NetworkManager and get > > associated some WPA/WEP keyed AP(s), then we get OS suspend and wake up, then > > sometimes we fail to get wireless card associated with keyed wireless AP(s). > This may also be due to the hardware encrpytion acceleration issues. Please > try these module parameters and see if you can reproduce the issue reliably. > Also, please be sure you have installed the wpa_supplicant-0.5.10-8.el5 (bug > #468441) update as well from RHEL 5.3 snapshot 5, as that is necessary to > ensure reliable connections. if I run "rpm -qil wpa_supplicant" at fresh installed RHEL 5.3 SP5, it will say: Name : wpa_supplicant Relocations: (not relocatable) Version : 0.5.10 Vendor : Red Hat, Inc. Release : 8.el5 Build Date : Thu 04 Dec 2008 04:16:18 AM CST Install Date : Tue 09 Dec 2008 01:47:05 PM CST Build Host : hs20-bc1-5.build.redhat.com Group : System Environment/Base Source RPM : wpa_supplicant-0.5.10-8.el5.src.rpm Size : 606077 License : BSD Signature : DSA/SHA1, Thu 04 Dec 2008 07:02:49 AM CST, Key ID fd372689897da07a Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://w1.fi/wpa_supplicant/ Summary : WPA/WPA2/IEEE 802.1X Supplicant so may I think the "wpa_supplicant-0.5.10-8.el5.x86_64.rpm" is ready in RHEL 5.3 SP5 release? Then I got 2 Montevina PCs (I installed the wpa_supplicant-0.5.10-8.el5 rpm from you web url by force into One Montevina PC) to do test. At both Montevina PCs, I can see almost the same result. Please correct me in case of error. Thx
(In reply to comment #111) > (In reply to comment #110) > > (In reply to comment #108) > > > after installing kernel-2.6.18-126.el5.x86_64.rpm into Montevina Sony Laptop > > > with RHEL 5.3 SP5 installed is as follows: > > > 1) it tries to get associated with a wireless AP on OS boot up, this will take > > > longer time to get OS boot-up. > > /sbin/chkconfig network off > > this is the Network service trying to bring up the connection, which is > > redundant since you're using NetworkManager. Alternatively, you may be > > hitting bug #472441. > If we run "/sbin/chkconfig network off", the eth(x) interface is not enable on > OS bootup. it seems inconvenient to our finial user? > So I prefer bug #472441 suggestion: modify "ONBOOT=yes" into "ONBOOT=no" in > /etc/sysconfig/network-scripts/ifcfg-wlan0? NM should be brining up a valid ifcfg file on startup, irregardless of ONBOOT to preserve 5.2 behavior. Can you attach your wlan0 ifcfg file here? Also please attach the output of /var/log/messages when the problem occurs, and please ensure you are using at least NetworkManager-0.7.0-0.12.svn4326.el5, which should be in snapshots after 2008-11-21. > > > 2) On OS bootup, sometimes we can find wlan0 interface by issuing "ifconfig", > > > but we can not find any wirelss AP(s) in NetworkManager > > The first scan or two may have failed becuase the Intel WiFi firmware and > > hardware have some odd assumptions about when scans can happen and when they > > can't. Wait a bit and the APs should show up in the menu. > > > > 3) On OS bootup, sometimes we can find nearby wirelss AP(s) in NetworkManager > > > and get associated some WPA/WEP keyed AP(s), but we always fail to switch > > > between keyed APs (such as WEP64<->WPA) > > There is a bug in the Intel driver's use of hardware encryption acceleration. > > Please use "swcrypto50=1" or "swcrypto=1" when loading the 'iwlagn' module as a > > workaround for now, by adding those options to /etc/modprobe.conf like so: > > alias wlan0 iwlagn swcrypto50=1 swcrypto=1 > I applied "alias wlan0 iwlagn swcrypto50=1 swcrypto=1", while the AP switch > failure issue can still be reproduced(such as fail to swith from WPA keyed mode > to WEP64 keyed mode). > > > > 4) On OS bootup, we can find nearby wirelss AP(s) in NetworkManager and get > > > associated some WPA/WEP keyed AP(s), then we get OS suspend and wake up, then > > > sometimes we fail to get wireless card associated with keyed wireless AP(s). > > This may also be due to the hardware encrpytion acceleration issues. Please > > try these module parameters and see if you can reproduce the issue reliably. > > Also, please be sure you have installed the wpa_supplicant-0.5.10-8.el5 (bug > > #468441) update as well from RHEL 5.3 snapshot 5, as that is necessary to > > ensure reliable connections. > > if I run "rpm -qil wpa_supplicant" at fresh installed RHEL 5.3 SP5, it will > say: > Name : wpa_supplicant Relocations: (not > relocatable) > Version : 0.5.10 Vendor : Red Hat, > Inc. > Release : 8.el5 Build Date : Thu 04 Dec > 2008 04:16:18 AM CST > Install Date : Tue 09 Dec 2008 01:47:05 PM CST Build Host : > hs20-bc1-5.build.redhat.com > Group : System Environment/Base Source RPM : > wpa_supplicant-0.5.10-8.el5.src.rpm > Size : 606077 License : BSD > Signature : DSA/SHA1, Thu 04 Dec 2008 07:02:49 AM CST, Key ID > fd372689897da07a > Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> > URL : http://w1.fi/wpa_supplicant/ > Summary : WPA/WPA2/IEEE 802.1X Supplicant > > so may I think the "wpa_supplicant-0.5.10-8.el5.x86_64.rpm" is ready in RHEL > 5.3 SP5 release? > > Then I got 2 Montevina PCs (I installed the wpa_supplicant-0.5.10-8.el5 rpm > from you web url by force into One Montevina PC) to do test. > At both Montevina PCs, I can see almost the same result. > > Please correct me in case of error. Thx Ok, can you add "-dddt" to the end of the "Exec=" line in /usr/share/dbus-1/system-services/fi.epitest.hostap.WPASupplicant.service then reboot, and reproduce the suspend/resume issue? Then attach both /var/log/wpa_supplicant.log and /var/log/messages to this bug report.
Did you try this? alias wlan0 iwlagn options iwlagn swcrypto50=1 swcrypto=1 I think that is the proper modprobe.conf syntax...
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Due to outstanding driver issues with hardware encryption acceleration, users of Intel WiFi Link 4965, 5100, 5150, 5300, and 5350 wireless cards are advised to disable hardware accelerated encryption via module parameters. Failure to do so may result in the inability to connect to WEP (Wired Equivalent Privacy) protected wireless networks after connecting to WPA (WiFi Protected Access) protected wireless networks. To do so, add the following options to /etc/modprobe.conf: alias wlan0 iwlagn options iwlagn swcrypto50=1 swcrypto=1 (where wlan0 is the default interface name of the first Intel WiFi Link device; replace or amend as appropriate for local hardware configuration)
Created attachment 327075 [details] wlan0 configuration file
Created attachment 327077 [details] /var/log/message info with wlan0 is configred (onboot=off)
Created attachment 327079 [details] wpa supplicant log info
Created attachment 327080 [details] /var/log/message for WEP/WAP suspend test
(In reply to comment #112) > (In reply to comment #111) > > (In reply to comment #110) > > > (In reply to comment #108) > > > > after installing kernel-2.6.18-126.el5.x86_64.rpm into Montevina Sony Laptop > > > > with RHEL 5.3 SP5 installed is as follows: > > > > 1) it tries to get associated with a wireless AP on OS boot up, this will take > > > > longer time to get OS boot-up. > > > /sbin/chkconfig network off > > > this is the Network service trying to bring up the connection, which is > > > redundant since you're using NetworkManager. Alternatively, you may be > > > hitting bug #472441. > > If we run "/sbin/chkconfig network off", the eth(x) interface is not enable on > > OS bootup. it seems inconvenient to our finial user? > > So I prefer bug #472441 suggestion: modify "ONBOOT=yes" into "ONBOOT=no" in > > /etc/sysconfig/network-scripts/ifcfg-wlan0? > NM should be brining up a valid ifcfg file on startup, irregardless of ONBOOT > to preserve 5.2 behavior. Can you attach your wlan0 ifcfg file here? Also > please attach the output of /var/log/messages when the problem occurs, and > please ensure you are using at least NetworkManager-0.7.0-0.12.svn4326.el5, > which should be in snapshots after 2008-11-21. I attached ifcfg-wlan0 which is configured with ONBOOT=no, this will disable wlan0 auto-boot on OS boot-up, pls help check it. As to NetworkManager, it will auto scan and bring up all network interfaces if user starts it in GUI interface, so even if we set wlan0 to ONBOOT=no on OS boot-up, it does not affect GUI user experience? As to NetworkManager, RHEL 5.3 SP5 seems come with NetworkManager-0.7.0-1.el5.src.rpm not NetworkManager-0.7.0-0.12.svn4326.el5? see as follows: [root@osve-mv ~]# rpm -qi NetworkManager Name : NetworkManager Relocations: (not relocatable) Version : 0.7.0 Vendor: Red Hat, Inc. Release : 1.el5 Build Date: Thu 04 Dec 2008 06:05:53 AM CST Install Date: Tue 09 Dec 2008 01:47:07 PM CST Build Host: hs20-bc1-5.build.redhat.com Group : System Environment/Base Source RPM: NetworkManager-0.7.0-1.el5.src.rpm Size : 3483522 License: GPLv2+ Signature : DSA/SHA1, Thu 04 Dec 2008 07:05:03 AM CST, Key ID fd372689897da07a Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://www.gnome.org/projects/NetworkManager/ Summary : Network connection manager and user applications Description : NetworkManager attempts to keep an active network connection available at all times. It is intended only for the desktop use-case, and is not intended for usage on servers. The point of NetworkManager is to make networking configuration and setup as painless and automatic as possible. If using DHCP, NetworkManager is _intended_ to replace default routes, obtain IP addresses from a DHCP server, and change nameservers whenever it sees fit. Pls help make sure the NetworkManager-0.7.0-0.12.svn4326.el5 will be integrated into coming RHEL 5.3 SPX release. Thx > > > > 2) On OS bootup, sometimes we can find wlan0 interface by issuing "ifconfig", > > > > but we can not find any wirelss AP(s) in NetworkManager > > > The first scan or two may have failed becuase the Intel WiFi firmware and > > > hardware have some odd assumptions about when scans can happen and when they > > > can't. Wait a bit and the APs should show up in the menu. > > > > > > 3) On OS bootup, sometimes we can find nearby wirelss AP(s) in NetworkManager > > > > and get associated some WPA/WEP keyed AP(s), but we always fail to switch > > > > between keyed APs (such as WEP64<->WPA) > > > There is a bug in the Intel driver's use of hardware encryption acceleration. > > > Please use "swcrypto50=1" or "swcrypto=1" when loading the 'iwlagn' module as a > > > workaround for now, by adding those options to /etc/modprobe.conf like so: > > > alias wlan0 iwlagn swcrypto50=1 swcrypto=1 > > I applied "alias wlan0 iwlagn swcrypto50=1 swcrypto=1", while the AP switch > > failure issue can still be reproduced(such as fail to swith from WPA keyed mode > > to WEP64 keyed mode). > > I applied 1) alias wlan0 iwlagn 2) options iwlagn swcrypto50=1 swcrypto=1 (instead of alias wlan0 iwlagn swcrypto50=1 swcrypto=1) to /etc/modprobe.conf, and above issue cannot be reproduced now, that is we can readily do WEP<->WPA mode AP switch successfully now. > > > > 4) On OS bootup, we can find nearby wirelss AP(s) in NetworkManager and get > > > > associated some WPA/WEP keyed AP(s), then we get OS suspend and wake up, then > > > > sometimes we fail to get wireless card associated with keyed wireless AP(s). > > > This may also be due to the hardware encrpytion acceleration issues. Please > > > try these module parameters and see if you can reproduce the issue reliably. > > > Also, please be sure you have installed the wpa_supplicant-0.5.10-8.el5 (bug > > > #468441) update as well from RHEL 5.3 snapshot 5, as that is necessary to > > > ensure reliable connections. > > > > if I run "rpm -qil wpa_supplicant" at fresh installed RHEL 5.3 SP5, it will > > say: > > Name : wpa_supplicant Relocations: (not > > relocatable) > > Version : 0.5.10 Vendor : Red Hat, > > Inc. > > Release : 8.el5 Build Date : Thu 04 Dec > > 2008 04:16:18 AM CST > > Install Date : Tue 09 Dec 2008 01:47:05 PM CST Build Host : > > hs20-bc1-5.build.redhat.com > > Group : System Environment/Base Source RPM : > > wpa_supplicant-0.5.10-8.el5.src.rpm > > Size : 606077 License : BSD > > Signature : DSA/SHA1, Thu 04 Dec 2008 07:02:49 AM CST, Key ID > > fd372689897da07a > > Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> > > URL : http://w1.fi/wpa_supplicant/ > > Summary : WPA/WPA2/IEEE 802.1X Supplicant > > > > so may I think the "wpa_supplicant-0.5.10-8.el5.x86_64.rpm" is ready in RHEL > > 5.3 SP5 release? > > > > Then I got 2 Montevina PCs (I installed the wpa_supplicant-0.5.10-8.el5 rpm > > from you web url by force into One Montevina PC) to do test. > > At both Montevina PCs, I can see almost the same result. > > > > Please correct me in case of error. Thx > Ok, can you add "-dddt" to the end of the "Exec=" line in > /usr/share/dbus-1/system-services/fi.epitest.hostap.WPASupplicant.service then > reboot, and reproduce the suspend/resume issue? Then attach both > /var/log/wpa_supplicant.log and /var/log/messages to this bug report. Updates: to /usr/share/dbus-1/system-services/fi.epitest.hostap.WPASupplicant.service, its contents is as follows: [D-BUS Service] Name=fi.epitest.hostap.WPASupplicant Exec=/usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /var/log/wpa_supplicant.log -dddt User=root Issue 1) I get an fatal issue when doing suspend , detailed steps is as follows: 1) get Montevina associted with a WEP64 AP successfully by NetworkManager 2) then switch to a WAP AP successfully by NetworkManager 3) click KLaptop to get Sony Montevina Laptop suspend 4) then user will get following info: Disabling non-boot CPUs iwlagn: No space for Tx iwlagn: Error sending REPLY_STATISTICS_CMD: enqueue_hcmd failed: -28 then suspend seems stops, what user can do is to press Power button to get OS bootup by force. Issue 2) Sometimes, on OS boot up, we fails to get Montevina assocaited with a WPA keyed AP, while we scarcely reproduce such an issue with WEP keyed AP. Issue 3) when I get Montevina associated with WPA AP, then go suspend, sometimes I will get error info as follows: Dec 16 02:09:36 osve-sony kernel: CPU1 is down Dec 16 02:09:36 osve-sony kernel: Stopping tasks: =====================================================<7>wlan0: deauthenticate(reason=3) Dec 16 02:09:36 osve-sony kernel: =============================================<3>iwlagn: Error: Response NULL in 'REPLY_ADD_STA' Dec 16 02:09:36 osve-sony kernel: ACPI: PCI interrupt for device 0000:06:00.0 disabled. while we can get Laptop to suspend and wakeup. All above updates is done with 2.6.18-126.el5 kernel under Sony Montevina Laptop. I noticed some iwlagn error info in the attached /var/log/messages, such as: iwlagn: Microcode SW error detected. Restarting 0x82000000. iwlagn: Error setting new RXON (-5)
I log another bug #476777 for above suspend issue, let us track the suspend issue at this bug in future.
(In reply to comment #114) > Release note added. If any revisions are required, please set the > "requires_release_notes" flag to "?" and edit the "Release Notes" field > accordingly. > All revisions will be proofread by the Engineering Content Services team. > > New Contents: > Due to outstanding driver issues with hardware encryption acceleration, users > of Intel WiFi Link 4965, 5100, 5150, 5300, and 5350 wireless cards are advised > to disable hardware accelerated encryption via module parameters. Failure to > do so may result in the inability to connect to WEP (Wired Equivalent Privacy) > protected wireless networks after connecting to WPA (WiFi Protected Access) > protected wireless networks. > > To do so, add the following options to /etc/modprobe.conf: > > alias wlan0 iwlagn > options iwlagn swcrypto50=1 swcrypto=1 > > (where wlan0 is the default interface name of the first Intel WiFi Link device; > replace or amend as appropriate for local hardware configuration) Dan/John, Is this hardware acceleration encryption issue an known issue of Intel wireless driver in upstream? If yes, is there any bug I can refer to and work with Intel developer to fix it? Thanks.
I cannot find "options iwlagn swcrypto50=1 swcrypto=1 " in /etc/modprobe.conf of RHEL SP6 release, will you add this line in future release? in addition, NetworkManager version is 0.7.0-2.el5 now.
(In reply to comment #113) > Did you try this? > > alias wlan0 iwlagn > options iwlagn swcrypto50=1 swcrypto=1 > > I think that is the proper modprobe.conf syntax... ^^ I can switch back and forth between WPA and WEP without any problems after adding these options to my modprobe.conf Versions: kernel-2.6.18-128.el5.x86_64 NetworkManager-0.7.0-2.el5.i386 NetworkManager-0.7.0-2.el5.x86_64 wpa_supplicant-0.5.10-8.el5.x86_64 iwl4965-firmware-228.57.2.21-2.noarch
~~ Snapshot 6 is out ~~ Partners, please test and let us know if this bug has been fixed. Add PartnerVerified keyword if everything works as expected. For any new issues encountered, CLONE this bug and report the issues in the new bug.
Most issues are fixed execpt for the suspend faiure (I have logged bug #476777 for it), I will close this defect.
No need to close it. Once you add PartnerVerified keyword, I'll review the test results. If they are fine, it will move to VERIFIED, and will eventually be closed automatically while 5.3 releases.
OK, I see.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html