Hide Forgot
Description of problem: Customer does have services configured, which does have dependency in network-online.target [Unit] Wants=network-online.target After=network-online.target These service should only start, when a stable connection is available. (which means. An IP-address assigned) Reading the documentation, this seems to be possible by reaching network-online.target [1] Customer does have 2 DHCP-connections, with different priority. # cat etc/sysconfig/network-scripts/ifcfg-lan TYPE=Ethernet PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=no IPV4_FAILURE_FATAL=no IPV6INIT=no NAME=lan UUID=17a9b738-2f28-4d93-8872-2a6c0e00582c ONBOOT=yes AUTOCONNECT_PRIORITY=10 MULTI_CONNECT=3 # cat etc/sysconfig/network-scripts/ifcfg-lan-8021x TYPE=Ethernet KEY_MGMT=IEEE8021X IEEE_8021X_EAP_METHODS=TLS IEEE_8021X_IDENTITY=hosst01 IEEE_8021X_AUTH_TIMEOUT=10 IEEE_8021X_CA_CERT=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem IEEE_8021X_PRIVATE_KEY_PASSWORD_FLAGS=unused IEEE_8021X_PRIVATE_KEY=/etc/wpa_supplicant/host01.key IEEE_8021X_CLIENT_CERT=/etc/wpa_supplicant/host01.crt PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=dhcp DEFROUTE=yes IPV4_FAILURE_FATAL=no IPV6INIT=no NAME=lan-8021x UUID=25c5d18b-8e49-4579-b8d5-6badf4b9630e ONBOOT=yes AUTOCONNECT_PRIORITY=50 MULTI_CONNECT=3 AUTH_RETRIES=1 Version-Release number of selected component (if applicable): NetworkManager-1.14.0-14.el8.x86_64 How reproducible: always in customer environment Steps to Reproduce: 1. setup a system with 2 connections on same interface with different priority Actual results: The target Network online is reached, even if NM isn't. Oct 07 13:07:25 host01 NetworkManager[1051]: <info> [1570446445.7901] manager: NetworkManager state is now CONNECTING Oct 07 13:07:36 host01 NetworkManager[1051]: <info> [1570446456.2388] manager: NetworkManager state is now DISCONNECTED Oct 07 13:07:36 host01 NetworkManager[1051]: <info> [1570446456.2388] manager: startup complete Oct 07 13:07:36 host01 systemd[1]: Started Network Manager Wait Online. Oct 07 13:07:36 host01 NetworkManager[1051]: <info> [1570446456.2421] manager: NetworkManager state is now CONNECTING Oct 07 13:07:36 host01 systemd[1]: Reached target Network is Online. Oct 07 13:08:22 host01 NetworkManager[1051]: <info> [1570446502.4358] manager: NetworkManager state is now CONNECTED_LOCAL Oct 07 13:08:22 host01 NetworkManager[1051]: <info> [1570446502.4468] manager: NetworkManager state is now CONNECTED_SITE Oct 07 13:08:22 host01 NetworkManager[1051]: <info> [1570446502.4476] manager: NetworkManager state is now CONNECTED_GLOBAL Expected results: Network Online Target, should only be reached, if a connection established successfully. Additional info: I know, Network-online target, is based on NetworkManager-wait-online.service, which itself using nm-online tool, to check connectivity. And based on service-definition and documentation, Network-Online is reached, as soon all auto-activate connections tried to activate. -s | --wait-for-startup Wait for NetworkManager startup to complete, rather than waiting for network connectivity specifically. ---> Startup is considered complete once ---> NetworkManager has activated (or ---> attempted to activate) every ---> auto-activate connection which is ---> available given the current network ---> state. (This is generally only useful at boot time; after startup has completed, nm-online -s will just return immediately, regardless of the current network state.) But how does it comply the definition of online from [1] """ network-online.target is a target that actively waits until the nework is "up", where the definition of "up" is defined by the network management software. Usually it indicates a configured, routable IP address of some kind. Its primary purpose is to actively delay activation of services until the network is set up. It is an active target, meaning that is may be pulled in by the services requiring the network to be up, but is not pulled in by the network management service itself """ [1]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
I'm also aware of Connectivity-section of NetworkManager.conf, but this does not have any impact on nm-online state. Is this correct? What is the intention behind this network-connectivity thing, if it will only print a 'limited' connection information, when I ask for it, but all other services ignoring this? [root@shy-lake ~]# cat /etc/NetworkManager/conf.d/00-custom-connectivity.conf [connectivity] uri=http://chunk.crazy.lab:8000/static.txt response=OK interval=60 [root@shy-lake ~]# grep -i online /var/log/messages Oct 9 09:15:25 shy-lake systemd[1]: Starting Network Manager Wait Online... Oct 9 09:15:28 shy-lake systemd[1]: Started Network Manager Wait Online. Oct 9 09:15:28 shy-lake systemd[1]: Reached target Network is Online. [root@shy-lake ~]# nmcli -t -f connectivity g limited
> I'm also aware of Connectivity-section of NetworkManager.conf, but this does not have any impact on nm-online state. Is this correct? Correct. Totally unrelated. --- In a nutshell: NetworkManager-wait-online.service waits for `nm-online -s` to quit. `nm-online -s` option basically waits until NetworkManager logs "startup complete". (btw. `nm-online`'s other options are not really useful. The main reason why this tool exists is for `NetworkManager-wait-online.service` with "-s" option). "startup complete" means that the state in NetworkManager is "settled". That basically means, all devices are either in unmanaged/unavailable/disconnected/activated state, and it's not expected that something else is going to happen. For example, when NM starts, it tries to autoactivate profiles (which keeps the device busy and delays startup complete). Eventually all profiles either successfully autoactivated, failed to autoactivate or were not supposed to autoactivate. Then startup is complete. Note that a device may be considered "activated" when either IPv4 or IPv6 method succeeds. That is especially the case with ipv4.may-fail=yes && ipv6.may-fail=yes settings. With that configuration (the default), NM only requires one address family to succeed. Hence, if you require IPv4, you may need to either set ipv4.may-fail=no or disable IPv6 (ipv6.method=ignore). --- Please provide a full log of NetworkManager that shows the issue. With level=TRACE logging. See the comments at [1] about logging, rate limiting and private data before collecting the log file. [1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28
Hi Thomas, thanks for clarification. I discussed with customer and as we know, the network and especially DHCP behavior is slow, we changed the timeout of nm-online to 300 seconds. But the behvaior is still same. Please find below the settings, which are used in scenario. The log I will attach privately, due to customer related information. ~~~ I tried overriding nm-online timeout to 300 and added IPV4_FAILURE_FATAL=yes to the connection profiles: # /etc/systemd/system/NetworkManager-wait-online.service.d/override.conf [Service] ExecStart= ExecStart=/usr/bin/nm-online -s -q --timeout=300 # /etc/sysconfig/network-scripts/ifcfg-lan_8021x TYPE=Ethernet KEY_MGMT=IEEE8021X IEEE_8021X_EAP_METHODS=TLS IEEE_8021X_IDENTITY=fektestz8 IEEE_8021X_AUTH_TIMEOUT=10 IEEE_8021X_CA_CERT=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem IEEE_8021X_PRIVATE_KEY_PASSWORD_FLAGS=unused IEEE_8021X_PRIVATE_KEY=/etc/wpa_supplicant/host01.key IEEE_8021X_CLIENT_CERT=/etc/wpa_supplicant/hosst01.crt PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=dhcp DEFROUTE=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME=lan_8021x UUID=25c5d18b-8e49-4579-b8d5-6badf4b9630e ONBOOT=yes AUTOCONNECT_PRIORITY=50 MULTI_CONNECT=3 AUTH_RETRIES=1 # /etc/sysconfig/network-scripts/ifcfg-lan TYPE=Ethernet PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=no IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME=lan UUID=17a9b738-2f28-4d93-8872-2a6c0e00582c ONBOOT=yes AUTOCONNECT_PRIORITY=10 MULTI_CONNECT=3 This does not seem to have changed anything. "Network is Online" is still reached before "lan" profile gets activated: Okt 10 12:01:45 host01 systemd[1]: Started Network Manager Wait Online. Okt 10 12:01:45 host01 NetworkManager[1065]: <info> [1570701705.2186] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Okt 10 12:01:45 host01 systemd[1]: Reached target Network is Online. ~~~
> This does not seem to have changed anything. "Network is Online" is still reached before "lan" profile gets activated: In the attached logfile from comment 4 there is no profile "ifcfg-lan" or "ifcfg-lan_8021x". also, you see that it starts autoactivating a profile named "vwlan_nac", which fails quickly (thereby unblocking startup-complete). Which profile do you intend to wait for on boot? The one that is autoconnecting, seems not to work... Extending the timeout for nm-online won't help, if nm-online quits earlier than expected, and before the original 30 seconds are over.
yes, this is a bug and should be fixed. Indeed, the device should not be considered ready, until we expect not more auto-actiations to happen.
Fixed upstream on master with commit: f583aec80
new test added: @nm_online_wait_for_second_connection Scenario: NM - general - wait for second device * Add a new connection of type "ethernet" and options "ifname testG con-name con_general 802-1x.eap md5 802-1x.identity user 802-1x.password password connection.autoconnect-priority 50 connection.auth-retries 1" * Add a new connection of type "ethernet" and options "ifname testG con-name con_general2 connection.autoconnect-priority 20" * Stop NM * Execute "rm -rf /var/run/NetworkManager" * Prepare simulated test "testG" device * Execute "ip netns exec testG_ns pkill -SIGSTOP -F /tmp/testG_ns.pid" * Start NM * Run child "echo FAIL > /tmp/nm-online.txt && /usr/bin/nm-online -s -q --timeout=60 && echo PASS > /tmp/nm-online.txt" When "con_general" is visible with command "nmcli con show -a" in "10" seconds When "FAIL" is visible with command "cat /tmp/nm-online.txt" * Execute "sleep 10" When "con_general2" is visible with command "nmcli con show -a" in "20" seconds When "FAIL" is visible with command "cat /tmp/nm-online.txt" * Execute "ip netns exec testG_ns pkill -SIGCONT -F /tmp/testG_ns.pid" Then "PASS" is visible with command "cat /tmp/nm-online.txt" in "10" seconds working well with 1.22.8 but failing under 1.20
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1847