Created attachment 1623729 [details] NM trace log export for the hour during testing. Contains 2 dot1x tests then successful connections without NM handling auth Description of problem: After the upgrade to FC30 from FC28 i'm no longer able to use dot1x authentication. I only have a wired network on this device but this was working fine on FC28 prior to the upgrade. After the upgrade it appears to simply not send the auth, then time out shortly after, repeating this loop. connectivity is fine if NM isn't doing auth. I have tested this doing a manual auth using wpa_supplicant and that works fine, and I've logged in using a new user that didn't exist prior to the upgrade, and NM works fine there, so it is something related to using an existing account that was there pre-upgrade. I've tried recreating a new wired network profile, changing the password to not use the KDE wallet etc, tried removing specific targeted KDE config files, removing all the KDE config directories, and removing the entire home directory (logging out and back in each time) but none of these things have worked. I've confirmed that nothing is being sent out the network interface at all by doing wireshark captures and checking with the auth system that handles the dot1x auth, no activity is seen at all. Trace logging from NM has been enabled but doesn't seem to have much useful info in it. I'm happy to complete any additional testing as required to resolve this issue and get a workable solution for anybody else that runs into it. Version-Release number of selected component (if applicable): Was using an earlier version when upgraded to FC30, retested again now with 1.16.4-1.fc30.x86_64, issue is still present How reproducible: Every time, 100% broken on this user. Steps to Reproduce: 1. Set up dot1x wired profile on FC28, connect to confirm working 2. Upgrade to FC30 3. Try to use dot1x profile again and confirm no longer functional Actual results: Auth timeout with nothing being sent out the network interface Expected results: dot1x auth successfully sent and received, interface comes up Additional info:
NM trace logging was enabled at 17:13:01 in the provided log, with a connection attempt starting at 17:13:06 using "Dot1x test" (set to ask for password each time) The password was provided at this time and again when prompted a second time. The prompt was then cancelled at 17:14:05. A connection without auth through NM was created at 17:18:34, using wpa_supplicant and a complete connection occurs here with a DHCP assigned IPv4 address and IPv6 coming up using the static IP as configured.
Can you please increase wpa_supplicant logging level with: busctl set-property fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 fi.w1.wpa_supplicant1 DebugLevel s excessive (as root) and then gather NM/wpa_supplicant logs with: journalctl -u NetworkManager -u wpa_supplicant --since='5 minutes ago' ?
Created attachment 1624551 [details] NM and WPA excessive logging as requested - 2019-10-11
(In reply to Beniamino Galvani from comment #2) > Can you please increase wpa_supplicant logging level with: > > busctl set-property fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 > fi.w1.wpa_supplicant1 DebugLevel s excessive > > (as root) and then gather NM/wpa_supplicant logs with: > > journalctl -u NetworkManager -u wpa_supplicant --since='5 minutes ago' > > ? Thanks for looking at this Beniamino, the logging is attached as requested, let me know if you need anything else. Cheers, Ayden PS - I used the dot1x profile that is set to prompt each time for this test, though it didn't prompt for some time on the first attempt (I assume it didn't get far enough through the process due to whatever the problem is until it timed out?)
PS - My manual WPA conf which works fine looks like this: ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=wheel ap_scan=0 network={ key_mgmt=IEEE8021X eap=PEAP identity="abeeson.au" password=REDACTED phase1="peaplabel=0" phase2="auth=MSCHAPV2" }
(In reply to Ayden Beeson from comment #5) > PS - My manual WPA conf which works fine looks like this: > > ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=wheel > ap_scan=0 > network={ > key_mgmt=IEEE8021X > eap=PEAP > identity="abeeson.au" > password=REDACTED > phase1="peaplabel=0" > phase2="auth=MSCHAPV2" > } Running as root though if thats potentially relevant. When run as my user i get this: Successfully initialized wpa_supplicant socket(PF_PACKET): Operation not permitted ioctl[SIOC{ADD/DEL}MULTI]: Operation not permitted ioctl[SIOCSIFFLAGS]: Operation not permitted driver_wired_init_common: Failed to enable allmulti enp0s25: Failed to initialize driver interface Which may be completely normal, I'm not sure hence I'm adding it here. My user is in the wheel group. abeeson ~ groups abeeson wheel wireshark
I realised after the capture that I still had an old authed session on the switch as I hadn't let my manual session end. I've manually cleared that session and re-run everything, but it doesn't appear to have changed the output in any appreciable way. I have also checked counters on the interface using ifconfig and i can see occasional packets coming in with the RX counter incrementing, but TX never seems to change, so it definitely looks like nothing is being sent out at all. I'll attach the second log anyway for reference.
Created attachment 1624556 [details] NM and WPA second log with cleared auth beforehand on upstream switch
Hi, wpa_supplicant is correctly sending out an EAPOL START packet at: Oct 11 11:50:13 wpa_supplicant[1110]: TX EAPOL - hexdump(len=4): 01 01 00 00 but it gets no response and therefore the authentication times out. This really looks like an issue on the switch side... but I don't know why wpa_supplicant alone would work. Maybe, try to run it with the '-ddd' option and compare the log with NM's one.
At first glance I'd thought the same, but I'm convinced it isn't now for a few reasons. 1. Nothing has changed on the switch side at all (before, during or after this issue started) 2. It works with another user with no changes using NM 3. It works manually with no changes / obvious differences 4. I'm not seeing the TX packets increment at all when trying to do it via NM. 5. wireshark captures show nothing going out the interface at all on NM connections, but look normal for manual sessions 6. Debugs from the switch side show nothing hitting it as well (I lost the interface counters from the switch in the scrollback but I'm pretty confident those didn't change either) I've attached a couple of logs (wpa_supp with -ddd on, some CLI commands showing ifconfig at various stages and another dump of the NM logging with WPA in excessive), as well as a wireshark cap that was running during an NM test, then after during a WPA manual connection. My best guess at this point is the data isn't being actually set out the NIC for whatever reason. It seems to have bound to the right interface etc, it just fails to go out at all Any ideas on why that might be the case when it works fine manually? Everything I can come up with shouldn't be a factor.
Created attachment 1625485 [details] NM and WPA logging - 2019-10-14 w/ excessive WPA logging on
Created attachment 1625486 [details] ifconfig output with timestamps using date
Created attachment 1625487 [details] WPA output with -ddd on
Created attachment 1625488 [details] wireshark capture - filtered for ethtype 0x888e The wireshark doesn't show any data prior to the manual attempt, but it was 100% running during this time, just nothing hit the interface.
I don't know enough about how dbus works but it seems to be the major difference between the NM and direct WPA manual logging, my connection is specifying an interface directly and never talking to dbus, whereas the NM logging shows it talking through dbus (dbus: fi.w1.wpa_supplicant1.Interface.SelectNetwork (/fi/w1/wpa_supplicant1/Interfaces/33)) The odd thing is the logging later refers to the interface by name: Oct 14 17:16:55 abeeson-dt4.csumain.csu.edu.au wpa_supplicant[1110]: Remove interface enp0s25 from radio Oct 14 17:16:55 abeeson-dt4.csumain.csu.edu.au wpa_supplicant[1110]: Remove radio So far that is the only major thing I can see, but my knowledge of dbus is far to limited to know if it matters or not.
This also appears each time: dbus: fill_dict_with_properties dbus_interface=fi.w1.wpa_supplicant1.Interface.P2PDevice dbus_property=P2PDeviceConfig getter failed
Maybe it's due to firewall dropping outgoing packets; which zone is the interface in? firewall-cmd --get-zone-of-interface=enp0s25 Does it make any difference if you temporarily change the zone to 'trusted' with: firewall-cmd --zone=trusted --change-interface=enp0s25 ?
I had wondered about firewalld being the issue as well. NM was set to put all connections (dot1x or not) into a zone called "work" - Not sure if I added this or its a built in one. I changed NM for the dot1x profile to set it to trusted, and then checked and set it using the CLI while it was disconnected, the first check showed no zone so I was hopeful: abeeson ~ firewall-cmd --get-zone-of-interface=enp0s25 no zone abeeson ~ firewall-cmd --zone=trusted --change-interface=enp0s25 success abeeson ~ firewall-cmd --get-zone-of-interface=enp0s25 trusted But unfortunately it is the same outcome, no increment to the TX counter throughout, auth never received at the switch etc. I've now reconnected using hte non dot1x NM profile (with the manual WPA running before it) and it is back to work as normal: abeeson ~ firewall-cmd --get-zone-of-interface=enp0s25 work I'm about to head home for the day but send through any other checks you can think of. If i get time tomorrow I'll log out and back into the working new account, and run a log capture from there so we can compare the output.
I think the problem is the following: Oct 14 17:16:07 wpa_supplicant[1110]: TX EAPOL: dst=01:80:c2:00:00:03 Oct 14 17:16:07 wpa_supplicant[1110]: TX EAPOL - hexdump(len=4): 01 01 00 00 Oct 14 17:16:09 NetworkManager[747]: <info> [1571033769.4226] device (enp0s25): carrier: link connected wpa_supplicant sends the EAPOL Start packet at 17:16:07 but the carrier goes up only two seconds later and so the packet is dropped. It would be interesting to know why the device doesn't have carrier and why it takes so long to go up. Could you paste again logs of the connection this time with NM trace logging enabled? You can run as root 'nmcli general logging level trace domains all' to increase the logging level (the setting is not persistent). Thank you.
(In reply to Beniamino Galvani from comment #19) > Could you paste again logs of the connection this time with > NM trace logging enabled? (it would be useful to have also wpa_supplicant excessive logs in the same file to compare the TX of frames with NM logs about carrier)
Proposed fix: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/330
*** Bug 1765490 has been marked as a duplicate of this bug. ***
Thanks Beniamino (and Dennis) for your work on this, apologies for my lack of attention on it, I've been working on critical work so I hadn't had a chance to come back to it. Once the fix is implemented I'll test it, and hopefully that'll resolve it! Cheers, Ayden
Upstream merged the fix as https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/838e5b87c2dae74c4d160769524b43d9ef6b03fa into version 1.22. However, Fedora 31 is still at 1.20.10-1.fc31. Has the fix been backported into that version?
Yes, the fix is in NM 1.20.10 : https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/8e2ad6f0c3d53cbb2d70b6aaecf344debaa8dd4c
That means this bug is RESOLVED/FIXED?
"Does not work" :( Jan 28 11:34:03 localhost.localdomain kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.8275] device (enp0s31f6): carrier: link connected Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.8277] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.8286] policy: auto-activating connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.8294] device (enp0s31f6): Activation: starting connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.8295] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.9172] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:03 localhost.localdomain kded5[2507]: plasma-nm: Unhandled active connection state change: 1 Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.9340] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:03 localhost.localdomain NetworkManager[1856]: <info> [1580207643.9900] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:04 localhost.localdomain NetworkManager[1856]: <info> [1580207644.1199] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:04 localhost.localdomain NetworkManager[1856]: <info> [1580207644.1948] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:04 localhost.localdomain NetworkManager[1856]: <info> [1580207644.2878] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:04 localhost.localdomain NetworkManager[1856]: <info> [1580207644.3431] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:04 localhost.localdomain NetworkManager[1856]: <info> [1580207644.4511] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:06 localhost.localdomain NetworkManager[1856]: <info> [1580207646.5551] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 11:34:09 localhost.localdomain NetworkManager[1856]: <info> [1580207649.9194] device (enp0s31f6): state change: config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:10 localhost.localdomain kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8043] device (enp0s31f6): carrier: link connected Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8047] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8060] policy: auto-activating connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8069] device (enp0s31f6): Activation: starting connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8071] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:10 localhost.localdomain NetworkManager[1856]: <info> [1580207650.8941] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:10 localhost.localdomain kded5[2507]: plasma-nm: Unhandled active connection state change: 1 Jan 28 11:34:16 localhost.localdomain NetworkManager[1856]: <info> [1580207656.8955] device (enp0s31f6): state change: config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:18 localhost.localdomain kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1002] device (enp0s31f6): carrier: link connected Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1005] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1015] policy: auto-activating connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1022] device (enp0s31f6): Activation: starting connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1037] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:18 localhost.localdomain NetworkManager[1856]: <info> [1580207658.1915] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 28 11:34:18 localhost.localdomain kded5[2507]: plasma-nm: Unhandled active connection state change: 1 Jan 28 11:34:24 localhost.localdomain NetworkManager[1856]: <info> [1580207664.1932] device (enp0s31f6): state change: config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') Jan 28 11:34:25 localhost.localdomain NetworkManager[1856]: <info> [1580207665.1422] device (enp0s31f6): carrier: link connected Jan 28 11:34:25 localhost.localdomain kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [...]
Yeah, the original bug should be fixed. It seems that the Ethernet NIC takes more than 6 seconds to establish the carrier. This doesn't seem related to the initial 802.1x problem. After 6 seconds without carrier, NM brings the device down again and restarts from scratch. Does this snippet in /etc/NetworkManager/conf.d/99-carrier.conf fixes the problem? [device-eth-timeout] match-device=type:ethernet carrier-wait-timeout=10000
(In reply to Beniamino Galvani from comment #28) > It seems that the Ethernet NIC takes more than 6 seconds to establish the > carrier. This doesn't seem related to the initial 802.1x problem. After 6 > seconds without carrier, NM brings the device down again and restarts from > scratch. Does this snippet in /etc/NetworkManager/conf.d/99-carrier.conf > fixes the problem? Yes, that works: Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.4656] device (enp0s31f6): Activation: starting connection 'CONNECTION_NAME' (CONNECTION_UUID) Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.4658] audit: op="connection-activate" uuid="CONNECTION_UUID" name="CONNECTION_NAME" pid=2572 uid=1000 result="success" Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.4659] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.5533] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 28 15:17:50 localhost.localdomain kded5[2507]: plasma-nm: Unhandled active connection state change: 1 Jan 28 15:17:50 localhost.localdomain kded5[2507]: plasma-nm: Unhandled active connection state change: 1 Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.5680] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.6070] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.6390] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.6689] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:50 localhost.localdomain NetworkManager[146447]: <info> [1580221070.7009] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.2742] device (enp0s31f6): Activation: (ethernet) connection 'CONNECTION_NAME' has security, but secrets are required. Jan 28 15:17:57 localhost.localdomain kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.2743] device (enp0s31f6): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed') Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.2751] device (enp0s31f6): Activation: (ethernet) asking for new secrets Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.2753] device (enp0s31f6): carrier: link connected Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.2918] device (enp0s31f6): state change: need-auth -> prepare (reason 'none', sys-iface-state: 'managed') Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.3777] audit: op="statistics" arg="refresh-rate-ms" pid=2572 uid=1000 result="success" Jan 28 15:17:57 localhost.localdomain NetworkManager[146447]: <info> [1580221077.3800] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') [...] Thank you!
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.