Bug 2082536

Summary: [NMCI] NM should set ethernet layer up before calling wpa_supplicant to perform EAPOL login (8021x_hostapd_freeradius_doc_procedure)
Product: Red Hat Enterprise Linux 8 Reporter: David Jaša <djasa>
Component: NetworkManagerAssignee: NetworkManager Development Team <nm-team>
Status: CLOSED MIGRATED QA Contact: Desktop QE <desktop-qa-list>
Severity: low Docs Contact:
Priority: unspecified    
Version: 8.6CC: bgalvani, dcaratti, lrintel, rkhan, sfaye, sukulkar, till, vbenes
Target Milestone: rcKeywords: MigratedToJIRA, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-17 12:18:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
behave report none

Description David Jaša 2022-05-06 11:17:55 UTC
Description of problem:
this was a mysterious issue in nmci: 8021x_hostapd_freeradius_doc_procedure failed quite consistently on el8 where wpa_supplicant called from shell most times succesfully authenticated against radius but then nm failed to bring up the connection because of wpa_supplicant the systemd service timed out waiting for any EAPOL reply (and NM then erroring out with unhelpful error of no secrets available). Network topology is:

                      no NS | vethsetup NS
       +----------------+
       |        br0     |   |
test1 -+- test1b   eth4 +---|--- (uplink)
       +----------------+
  |             |
  |             +-- hostapd listens on br0
  |
  +-- wpa_supplicant connects to test1

and statuses of relevant interfaces before calling of the wpa_supplicant and 'nmcli c up ...' is:
68: test1@test1b: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 1e:fa:8e:06:df:81 brd ff:ff:ff:ff:ff:ff
67: test1b@test1: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 1a:87:1f:1d:5e:e9 brd ff:ff:ff:ff:ff:ff
66: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 1a:87:1f:1d:5e:e9 brd ff:ff:ff:ff:ff:ff
38: eth4@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 86:56:b1:74:c0:fc brd ff:ff:ff:ff:ff:ff link-netns vethsetup

When the test1 interface is brought up using 'ip l set test1 up', the test consistently passes. So the likely explanation is that NM instructs wpa_supplicant.service to perform EAPOL login on interface whose link is down - and wpa_supplicant the systemd service then fails. IMO NM shouldn't leave bringin up link layer on wpa_supplicant and it should do so itself before calling wpa_supplicant.


Version-Release number of selected component (if applicable):
main, 1.38, 1.36, 1.34 on el8 (el9 seems unaffected)

Comment 1 David Jaša 2022-05-06 11:28:18 UTC
Created attachment 1877548 [details]
behave report

The other point of view may be that before and after running 'wpa_supplicant -c ...', test1 is also down and wpa_supplicant brings the interface up when logging in and down when exiting as you can see in attached Behave report, otherwise it wouldn't be able to autheticate against radius. Is this inconsistency between wpa_supplicant the systemd/dbus service and wpa_supplicant the shell command intentional, Davide?

Comment 2 David Jaša 2022-05-06 11:43:02 UTC
It seems with a bit diverse set of test machines that this is not the main issue, but it still seems to me worthy to try to find some conclusion for this one.

Comment 3 Beniamino Galvani 2022-05-10 07:55:06 UTC
In "Status After Scenario" I see:

 70: test1b@test1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
     link/ether 62:30:f7:2e:fe:f4 brd ff:ff:ff:ff:ff:ff
 71: test1@test1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
     link/ether 0a:e6:e1:f4:d1:7b brd ff:ff:ff:ff:ff:ff

Also in NM logs there is:

 <info>  [1651836008.0367] device (test1): carrier: link connected
 <info>  [1651836008.0398] device (test1): state change: unavailable -> disconnected (reason 'user-requested', sys-iface-state: 'managed')
 <info>  [1651836008.0522] device (test1): Activation: starting connection 'test1-ttls' (83fa4d0b-a806-4e94-8d47-fdf84a80c184)

The "carrier: link connected" indicates that there is carrier on the interface, and that can only happen when the link is set as "up".

I think the link status is not the problem here. If you can provide a NM log with trace level, that probably would be helpful to understand the actual problem.