Bug 1441418
| Summary: | [NMCI] WPA2 PEAP connection sometimes weirdly fails | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Vladimir Benes <vbenes> |
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> |
| Status: | CLOSED DUPLICATE | QA Contact: | Desktop QE <desktop-qa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.4 | CC: | aloughla, atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller, vbenes |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-06 16:05:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
Created attachment 1270957 [details]
journal trace log
Created attachment 1270958 [details]
log from successful run
I can see a bunch of warnings too: (process:13115): GLib-GIO-WARNING **: gdbusobjectmanagerclient.c:1585: Processing InterfaceRemoved signal for path /org/freedesktop/NetworkManager/IP6Config/35 but no object proxy exists (process:13115): libnm-WARNING **: no object known for /org/freedesktop/NetworkManager/AccessPoint/610 the glib warnings are a separate issue. In the attached logfile (and in 3 similar cases from beaker tests), I always see a timeout with: "device (wlan0): Activation: (wifi) association took too long" message. »···»···/* Connection failed; either driver problems, the encryption key is »···»··· * wrong, or the passwords or certificates were wrong. »···»··· */ »···»···_LOGW (LOGD_DEVICE | LOGD_WIFI, »···»··· "Activation: (wifi) association took too long"); so, we ask for new secrets (which then fails). But the issue is that the first association attempt fails. Unknown reason so far. I noticed something similar, sometimes the wifi activation gets stuck. Looking at wpa_supplicant logs, it seems that the randomization of the scan MAC address causes problems with wpa_supplicant, that loses the authentication event and makes the connection time out. On a successful run: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlp4s0 nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlp4s0(02:7a:1f:ab:00:cd) A1=02:7a:1f:ab:00:cd A2=f4:f2:6d:1a:59:96 nl80211: Authenticate event Instead on failure: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlp4s0 nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlp4s0(b2:bf:62:b9:fa:88) A1=02:7a:1f:ab:00:cd A2=f4:f2:6d:1a:59:96 nl80211: wlp4s0: Ignore MLME frame event for foreign address b2:bf:62:b9:fa:88 is the scan MAC address set by NM: device (wlp4s0): set-hw-addr: set MAC address to B2:BF:62:B9:FA:88 (scanning) I'm further investigating this. when I added set-hw-addr, I noticed that older versions of supplicants failed to properly reload when the MAC address changes externally. Newer versions seem fixed. The version in rhel-7 back then seemed not entirely correct to me at that time, but later I was unable to reproduce the issue. Maybe it is the same issue. Which supplicant version were you using? See https://bugzilla.redhat.com/show_bug.cgi?id=1374756#c7 both wpa_supplicant-2.0-21.el7_3.x86_64 wpa_supplicant-2.6-3.el7.x86_64 I'm using wpa_supplicant compiled from git master. wpa_supplicant should detect a change in the MAC address after this commit: https://w1.fi/cgit/hostap/commit/?id=3e0272ca00ce1df35b45e7d739dd7e935f13fd84 However, there is a race condition here: https://w1.fi/cgit/hostap/tree/src/drivers/driver_nl80211.c?id=8befe8a993be661492eae73e2bc76b62c7a9d9f6#n999 When wpa_s receives the down event, it is possible that we have already brought the interface up again and so the down-up cycle is ignored. Created attachment 1273634 [details]
[PATCH] nl80211: Fix race condition in detecting MAC change
A patch to fix the race in wpa_supplicant.
(In reply to Beniamino Galvani from comment #10) > Created attachment 1273634 [details] > [PATCH] nl80211: Fix race condition in detecting MAC change > > A patch to fix the race in wpa_supplicant. lgtm Created attachment 1273645 [details]
[PATCH] device: delay bringing wifi interfaces up after MAC change
Until wpa_supplicant gets patched, this is the only workaround I found for NM.
(In reply to Beniamino Galvani from comment #12) > Created attachment 1273645 [details] > [PATCH] device: delay bringing wifi interfaces up after MAC change > > Until wpa_supplicant gets patched, this is the only workaround I found for > NM. lgtm It turns out that the the output is buffered, and so timestamps are not reliable. I captured a dump of the traffic and there were missing EAPOL frames from the AP which caused an authentication timeout. But now I'm not able to reproduce the issue anymore. The root cause seems the same of bug 1449301, closing as duplicate. *** This bug has been marked as a duplicate of bug 1449301 *** |
Description of problem: I see occasional fail in test that is creating WPA2 PEAP profile via editor and connecting it. Version-Release number of selected component (if applicable): How reproducible: It's quite random but reproduceable (at least once in 20 attempts) Steps to Reproduce: Scenario: nmcli - wifi-sec - configure and connect WPA2-PEAP profile * Add a new connection of type "wifi" and options "ifname wlan0 con-name qe-wpa2-enterprise autoconnect off ssid qe-wpa2-enterprise" * Check ifcfg-name file created for connection "qe-wpa2-enterprise" * Open editor for connection "qe-wpa2-enterprise" * Set a property named "802-11-wireless-security.key-mgmt" to "wpa-eap" in editor * Set a property named "802-1x.eap" to "peap" in editor * Set a property named "802-1x.identity" to "Bill Smith" in editor * Set a property named "802-1x.ca-cert" to "file:///tmp/certs/eaptest_ca_cert.pem" in editor * Set a property named "802-1x.phase2-auth" to "gtc" in editor * Set a property named "802-1x.password" to "testing123" in editor * Save in editor * No error appeared in editor * Check value saved message showed in editor * Quit editor * Bring up connection "qe-wpa2-enterprise" Then "qe-wpa2-enterprise" is visible with command "iw dev wlan0 link" Then "\*\s+qe-wpa2-enterprise" is visible with command "nmcli -f IN-USE,SSID device wifi list" Then "inet 10." is visible with command "ip a s wlan0" Actual results: fail sometimes Expected results: no fail Additional info: Bigger timeouts between steps are not helpful. Not really sure if this isn't caused by infrastructure. line 536: dub 11 17:41:36 wlan-r2s10.wlan.rhts.eng.bos.redhat.com NetworkManager[2966]: <debug> [1491946896.6142] device[0x55fbb63c9180] (wlan0): wifi-scan: scanning requested but not allowed at this time This is weird. I don't think it's present when it passes