RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1441418 - [NMCI] WPA2 PEAP connection sometimes weirdly fails
Summary: [NMCI] WPA2 PEAP connection sometimes weirdly fails
Keywords:
Status: CLOSED DUPLICATE of bug 1449301
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-11 21:51 UTC by Vladimir Benes
Modified: 2017-06-06 16:05 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-06 16:05:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
journal trace log (208.42 KB, text/plain)
2017-04-11 21:51 UTC, Vladimir Benes
no flags Details
log from successful run (363.37 KB, text/plain)
2017-04-11 21:54 UTC, Vladimir Benes
no flags Details
[PATCH] nl80211: Fix race condition in detecting MAC change (4.47 KB, patch)
2017-04-24 15:31 UTC, Beniamino Galvani
no flags Details | Diff
[PATCH] device: delay bringing wifi interfaces up after MAC change (1.23 KB, patch)
2017-04-24 16:18 UTC, Beniamino Galvani
no flags Details | Diff

Description Vladimir Benes 2017-04-11 21:51:20 UTC
Description of problem:
I see occasional fail in test that is creating WPA2 PEAP profile via editor and connecting it.
 

Version-Release number of selected component (if applicable):


How reproducible:
It's quite random but reproduceable (at least once in 20 attempts) 

Steps to Reproduce:
    Scenario: nmcli - wifi-sec - configure and connect WPA2-PEAP profile
    * Add a new connection of type "wifi" and options "ifname wlan0 con-name qe-wpa2-enterprise autoconnect off ssid qe-wpa2-enterprise"
    * Check ifcfg-name file created for connection "qe-wpa2-enterprise"
    * Open editor for connection "qe-wpa2-enterprise"
    * Set a property named "802-11-wireless-security.key-mgmt" to "wpa-eap" in editor
    * Set a property named "802-1x.eap" to "peap" in editor
    * Set a property named "802-1x.identity" to "Bill Smith" in editor
    * Set a property named "802-1x.ca-cert" to "file:///tmp/certs/eaptest_ca_cert.pem" in editor
    * Set a property named "802-1x.phase2-auth" to "gtc" in editor
    * Set a property named "802-1x.password" to "testing123" in editor
    * Save in editor
    * No error appeared in editor
    * Check value saved message showed in editor
    * Quit editor
    * Bring up connection "qe-wpa2-enterprise"
    Then "qe-wpa2-enterprise" is visible with command "iw dev wlan0 link"
    Then "\*\s+qe-wpa2-enterprise" is visible with command "nmcli -f IN-USE,SSID device wifi list"
    Then "inet 10." is visible with command "ip a s wlan0"


Actual results:
fail sometimes

Expected results:
no fail

Additional info:
Bigger timeouts between steps are not helpful. 
Not really sure if this isn't caused by infrastructure.

line 536:
dub 11 17:41:36 wlan-r2s10.wlan.rhts.eng.bos.redhat.com NetworkManager[2966]: <debug> [1491946896.6142] device[0x55fbb63c9180] (wlan0): wifi-scan: scanning requested but not allowed at this time

This is weird. I don't think it's present when it passes

Comment 1 Vladimir Benes 2017-04-11 21:51:51 UTC
Created attachment 1270957 [details]
journal trace log

Comment 2 Vladimir Benes 2017-04-11 21:54:47 UTC
Created attachment 1270958 [details]
log from successful run

Comment 3 Vladimir Benes 2017-04-12 08:27:21 UTC
I can see a bunch of warnings too:
(process:13115): GLib-GIO-WARNING **: gdbusobjectmanagerclient.c:1585: Processing InterfaceRemoved signal for path /org/freedesktop/NetworkManager/IP6Config/35 but no object proxy exists

(process:13115): libnm-WARNING **: no object known for /org/freedesktop/NetworkManager/AccessPoint/610

Comment 4 Thomas Haller 2017-04-13 13:07:41 UTC
the glib warnings are a separate issue.


In the attached logfile (and in 3 similar cases from beaker tests), I always see a timeout with: "device (wlan0): Activation: (wifi) association took too long" message.

»···»···/* Connection failed; either driver problems, the encryption key is
»···»··· * wrong, or the passwords or certificates were wrong.
»···»··· */
»···»···_LOGW (LOGD_DEVICE | LOGD_WIFI,
»···»···       "Activation: (wifi) association took too long");

so, we ask for new secrets (which then fails).


But the issue is that the first association attempt fails. Unknown reason so far.

Comment 5 Beniamino Galvani 2017-04-21 09:13:54 UTC
I noticed something similar, sometimes the wifi activation gets
stuck. Looking at wpa_supplicant logs, it seems that the randomization
of the scan MAC address causes problems with wpa_supplicant, that
loses the authentication event and makes the connection time out.

On a successful run:

 nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlp4s0
 nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlp4s0(02:7a:1f:ab:00:cd) A1=02:7a:1f:ab:00:cd A2=f4:f2:6d:1a:59:96
 nl80211: Authenticate event

Instead on failure:

 nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlp4s0
 nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlp4s0(b2:bf:62:b9:fa:88) A1=02:7a:1f:ab:00:cd A2=f4:f2:6d:1a:59:96
 nl80211: wlp4s0: Ignore MLME frame event for foreign address

b2:bf:62:b9:fa:88 is the scan MAC address set by NM:

 device (wlp4s0): set-hw-addr: set MAC address to B2:BF:62:B9:FA:88 (scanning)

I'm further investigating this.

Comment 6 Thomas Haller 2017-04-21 11:34:35 UTC
when I added set-hw-addr, I noticed that older versions of supplicants failed to properly reload when the MAC address changes externally.

Newer versions seem fixed. The version in rhel-7 back then seemed not entirely correct to me at that time, but later I was unable to reproduce the issue.
Maybe it is the same issue. Which supplicant version were you using? See https://bugzilla.redhat.com/show_bug.cgi?id=1374756#c7

Comment 7 Vladimir Benes 2017-04-21 11:48:42 UTC
both
wpa_supplicant-2.0-21.el7_3.x86_64
wpa_supplicant-2.6-3.el7.x86_64

Comment 8 Beniamino Galvani 2017-04-21 11:55:59 UTC
I'm using wpa_supplicant compiled from git master. wpa_supplicant
should detect a change in the MAC address after this commit:

https://w1.fi/cgit/hostap/commit/?id=3e0272ca00ce1df35b45e7d739dd7e935f13fd84

However, there is a race condition here:

https://w1.fi/cgit/hostap/tree/src/drivers/driver_nl80211.c?id=8befe8a993be661492eae73e2bc76b62c7a9d9f6#n999

When wpa_s receives the down event, it is possible that we have
already brought the interface up again and so the down-up cycle is
ignored.

Comment 10 Beniamino Galvani 2017-04-24 15:31:10 UTC
Created attachment 1273634 [details]
[PATCH] nl80211: Fix race condition in detecting MAC change

A patch to fix the race in wpa_supplicant.

Comment 11 Thomas Haller 2017-04-24 15:37:52 UTC
(In reply to Beniamino Galvani from comment #10)
> Created attachment 1273634 [details]
> [PATCH] nl80211: Fix race condition in detecting MAC change
> 
> A patch to fix the race in wpa_supplicant.

lgtm

Comment 12 Beniamino Galvani 2017-04-24 16:18:59 UTC
Created attachment 1273645 [details]
[PATCH] device: delay bringing wifi interfaces up after MAC change

Until wpa_supplicant gets patched, this is the only workaround I found for NM.

Comment 13 Thomas Haller 2017-04-24 16:33:29 UTC
(In reply to Beniamino Galvani from comment #12)
> Created attachment 1273645 [details]
> [PATCH] device: delay bringing wifi interfaces up after MAC change
> 
> Until wpa_supplicant gets patched, this is the only workaround I found for
> NM.

lgtm

Comment 17 Beniamino Galvani 2017-05-11 08:55:22 UTC
It turns out that the the output is buffered, and so timestamps are not reliable. I captured a dump of the traffic and there were missing EAPOL frames from the AP which caused an authentication timeout. But now I'm not able to reproduce the issue anymore.

Comment 18 Beniamino Galvani 2017-06-06 16:05:29 UTC
The root cause seems the same of bug 1449301, closing as duplicate.

*** This bug has been marked as a duplicate of bug 1449301 ***


Note You need to log in before you can comment on or make changes to this bug.