Bug 1759956 - network-online.target reached, even if there is no connectivity given from connections
Summary: network-online.target reached, even if there is no connectivity given from co...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.0
Assignee: Antonio Cardace
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-09 13:34 UTC by Steffen Froemer
Modified: 2020-04-28 16:54 UTC (History)
11 users (show)

Fixed In Version: NetworkManager-1.22.6-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:53:06 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1847 0 None None None 2020-04-28 16:53:59 UTC

Description Steffen Froemer 2019-10-09 13:34:29 UTC
Description of problem:
Customer does have services configured, which does have dependency in network-online.target

[Unit]
Wants=network-online.target
After=network-online.target


These service should only start, when a stable connection is available. (which means. An IP-address assigned)
Reading the documentation, this seems to be possible by reaching network-online.target [1]


Customer does have 2 DHCP-connections, with different priority.

# cat etc/sysconfig/network-scripts/ifcfg-lan
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
PEERDNS=no
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=lan
UUID=17a9b738-2f28-4d93-8872-2a6c0e00582c
ONBOOT=yes
AUTOCONNECT_PRIORITY=10
MULTI_CONNECT=3


# cat etc/sysconfig/network-scripts/ifcfg-lan-8021x 
TYPE=Ethernet
KEY_MGMT=IEEE8021X
IEEE_8021X_EAP_METHODS=TLS
IEEE_8021X_IDENTITY=hosst01
IEEE_8021X_AUTH_TIMEOUT=10
IEEE_8021X_CA_CERT=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
IEEE_8021X_PRIVATE_KEY_PASSWORD_FLAGS=unused
IEEE_8021X_PRIVATE_KEY=/etc/wpa_supplicant/host01.key
IEEE_8021X_CLIENT_CERT=/etc/wpa_supplicant/host01.crt
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=lan-8021x
UUID=25c5d18b-8e49-4579-b8d5-6badf4b9630e
ONBOOT=yes
AUTOCONNECT_PRIORITY=50
MULTI_CONNECT=3
AUTH_RETRIES=1




Version-Release number of selected component (if applicable):
NetworkManager-1.14.0-14.el8.x86_64

How reproducible:
always in customer environment

Steps to Reproduce:
1. setup a system with 2 connections on same interface with different priority


Actual results:
The target Network online is reached, even if NM isn't. 

Oct 07 13:07:25 host01 NetworkManager[1051]: <info>  [1570446445.7901] manager: NetworkManager state is now CONNECTING
Oct 07 13:07:36 host01 NetworkManager[1051]: <info>  [1570446456.2388] manager: NetworkManager state is now DISCONNECTED
Oct 07 13:07:36 host01 NetworkManager[1051]: <info>  [1570446456.2388] manager: startup complete
Oct 07 13:07:36 host01 systemd[1]: Started Network Manager Wait Online.
Oct 07 13:07:36 host01 NetworkManager[1051]: <info>  [1570446456.2421] manager: NetworkManager state is now CONNECTING
Oct 07 13:07:36 host01 systemd[1]: Reached target Network is Online.
Oct 07 13:08:22 host01 NetworkManager[1051]: <info>  [1570446502.4358] manager: NetworkManager state is now CONNECTED_LOCAL
Oct 07 13:08:22 host01 NetworkManager[1051]: <info>  [1570446502.4468] manager: NetworkManager state is now CONNECTED_SITE
Oct 07 13:08:22 host01 NetworkManager[1051]: <info>  [1570446502.4476] manager: NetworkManager state is now CONNECTED_GLOBAL



Expected results:
Network Online Target, should only be reached, if a connection established successfully.

Additional info:
I know, Network-online target, is based on NetworkManager-wait-online.service, which itself using nm-online tool, to check connectivity.

And based on service-definition and documentation, Network-Online is reached, as soon all auto-activate connections tried to activate.

       -s | --wait-for-startup
           Wait for NetworkManager startup to
           complete, rather than waiting for
           network connectivity specifically.
      ---> Startup is considered complete once
      ---> NetworkManager has activated (or
      ---> attempted to activate) every
      ---> auto-activate connection which is
      ---> available given the current network
      ---> state. (This is generally only useful
           at boot time; after startup has
           completed, nm-online -s will just
           return immediately, regardless of the
           current network state.)


But how does it comply the definition of online from [1]

""" network-online.target is a target that actively waits until the nework is "up", where the definition of "up" is defined by the network management software. Usually it indicates a configured, routable IP address of some kind. Its primary purpose is to actively delay activation of services until the network is set up. It is an active target, meaning that is may be pulled in by the services requiring the network to be up, but is not pulled in by the network management service itself """


[1]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 1 Steffen Froemer 2019-10-09 13:38:31 UTC
I'm also aware of Connectivity-section of NetworkManager.conf, but this does not have any impact on nm-online state.
Is this correct? What is the intention behind this network-connectivity thing, if it will only print a 'limited' connection information, when I ask for it, but all other services ignoring this?


[root@shy-lake ~]# cat /etc/NetworkManager/conf.d/00-custom-connectivity.conf 
[connectivity]                                                                  
uri=http://chunk.crazy.lab:8000/static.txt                                      
response=OK                                                                     
interval=60                                                                     


[root@shy-lake ~]# grep -i online /var/log/messages     
Oct  9 09:15:25 shy-lake systemd[1]: Starting Network Manager Wait Online...
Oct  9 09:15:28 shy-lake systemd[1]: Started Network Manager Wait Online.
Oct  9 09:15:28 shy-lake systemd[1]: Reached target Network is Online.
[root@shy-lake ~]# nmcli -t -f connectivity g
limited

Comment 2 Thomas Haller 2019-10-09 13:55:35 UTC
> I'm also aware of Connectivity-section of NetworkManager.conf, but this does not have any impact on nm-online state.
Is this correct? 

Correct. Totally unrelated.

---

In a nutshell:

NetworkManager-wait-online.service waits for `nm-online -s` to quit.
`nm-online -s` option basically waits until NetworkManager logs "startup complete".

(btw. `nm-online`'s other options are not really useful. The main reason why this tool exists is for `NetworkManager-wait-online.service` with "-s" option).

"startup complete" means that the state in NetworkManager is "settled". That basically means, all devices are either in unmanaged/unavailable/disconnected/activated state, and it's not expected that something else is going to happen. For example, when NM starts, it tries to autoactivate profiles (which keeps the device busy and delays startup complete). Eventually all profiles either successfully autoactivated, failed to autoactivate or were not supposed to autoactivate. Then startup is complete.


Note that a device may be considered "activated" when either IPv4 or IPv6 method succeeds. That is especially the case with ipv4.may-fail=yes && ipv6.may-fail=yes settings. With that configuration (the default), NM only requires one address family to succeed. Hence, if you require IPv4, you may need to either set ipv4.may-fail=no or disable IPv6 (ipv6.method=ignore).

---

Please provide a full log of NetworkManager that shows the issue. With level=TRACE logging. See the comments at [1] about logging, rate limiting and private data before collecting the log file.

[1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28

Comment 3 Steffen Froemer 2019-10-11 09:06:34 UTC
Hi Thomas,

thanks for clarification. I discussed with customer and as we know, the network and especially DHCP behavior is slow, we changed the timeout of nm-online to 300 seconds. But the behvaior is still same.

Please find below the settings, which are used in scenario.
The log I will attach privately, due to customer related information.

~~~
I tried overriding nm-online timeout to 300 and added  IPV4_FAILURE_FATAL=yes to the connection profiles:
 
# /etc/systemd/system/NetworkManager-wait-online.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/nm-online -s -q --timeout=300

# /etc/sysconfig/network-scripts/ifcfg-lan_8021x
TYPE=Ethernet
KEY_MGMT=IEEE8021X
IEEE_8021X_EAP_METHODS=TLS
IEEE_8021X_IDENTITY=fektestz8
IEEE_8021X_AUTH_TIMEOUT=10
IEEE_8021X_CA_CERT=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
IEEE_8021X_PRIVATE_KEY_PASSWORD_FLAGS=unused
IEEE_8021X_PRIVATE_KEY=/etc/wpa_supplicant/host01.key
IEEE_8021X_CLIENT_CERT=/etc/wpa_supplicant/hosst01.crt
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME=lan_8021x
UUID=25c5d18b-8e49-4579-b8d5-6badf4b9630e
ONBOOT=yes
AUTOCONNECT_PRIORITY=50
MULTI_CONNECT=3
AUTH_RETRIES=1

# /etc/sysconfig/network-scripts/ifcfg-lan
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
PEERDNS=no
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME=lan
UUID=17a9b738-2f28-4d93-8872-2a6c0e00582c
ONBOOT=yes
AUTOCONNECT_PRIORITY=10
MULTI_CONNECT=3

This does not seem to have changed anything. "Network is Online" is still reached before "lan" profile gets activated:

Okt 10 12:01:45 host01 systemd[1]: Started Network Manager Wait Online.
Okt 10 12:01:45 host01 NetworkManager[1065]: <info>  [1570701705.2186] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Okt 10 12:01:45 host01 systemd[1]: Reached target Network is Online.

~~~

Comment 6 Thomas Haller 2019-10-11 10:45:40 UTC
> This does not seem to have changed anything. "Network is Online" is still reached before "lan" profile gets activated:

In the attached logfile from comment 4 there is no profile "ifcfg-lan" or "ifcfg-lan_8021x".

also, you see that it starts autoactivating a profile named "vwlan_nac", which fails quickly (thereby unblocking startup-complete).

Which profile do you intend to wait for on boot? The one that is autoconnecting, seems not to work...



Extending the timeout for nm-online won't help, if nm-online quits earlier than expected, and before the original 30 seconds are over.

Comment 8 Thomas Haller 2019-10-14 08:47:49 UTC
yes, this is a bug and should be fixed.

Indeed, the device should not be considered ready, until we expect not more auto-actiations to happen.

Comment 12 Antonio Cardace 2020-01-28 08:54:50 UTC
Fixed upstream on master with commit: f583aec80

Comment 14 Vladimir Benes 2020-02-27 09:51:18 UTC
new test added:
    @nm_online_wait_for_second_connection
    Scenario: NM - general - wait for second device
    * Add a new connection of type "ethernet" and options "ifname testG con-name con_general 802-1x.eap md5 802-1x.identity user 802-1x.password password connection.autoconnect-priority 50 connection.auth-retries 1"
    * Add a new connection of type "ethernet" and options "ifname testG con-name con_general2 connection.autoconnect-priority 20"
    * Stop NM
    * Execute "rm -rf /var/run/NetworkManager"
    * Prepare simulated test "testG" device
    * Execute "ip netns exec testG_ns pkill -SIGSTOP -F /tmp/testG_ns.pid"
    * Start NM
    * Run child "echo FAIL > /tmp/nm-online.txt && /usr/bin/nm-online -s -q --timeout=60 && echo PASS > /tmp/nm-online.txt"
    When "con_general" is visible with command "nmcli con show -a" in "10" seconds
    When "FAIL" is visible with command "cat /tmp/nm-online.txt"
    * Execute "sleep 10"
    When "con_general2" is visible with command "nmcli con show -a" in "20" seconds
    When "FAIL" is visible with command "cat /tmp/nm-online.txt"
    * Execute "ip netns exec testG_ns pkill -SIGCONT -F /tmp/testG_ns.pid"
    Then "PASS" is visible with command "cat /tmp/nm-online.txt" in "10" seconds

working well with 1.22.8 but failing under 1.20

Comment 16 errata-xmlrpc 2020-04-28 16:53:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1847


Note You need to log in before you can comment on or make changes to this bug.