Description of problem: NetworkManager-wait-online.service does not wait for network to be online. This is because it has the following line: ExecStart=/usr/bin/nm-online -s -q --timeout=30 The -s says wait for NetworkManager to be running. It does not wait for any network interface to be usable. Which means that any service, like nfs mounts, will fail. And in my setup are failing. Version-Release number of selected component (if applicable): NetworkManager-1.8.2-1.fc26.x86_64 How reproducible: Add the following unit and see that its output shows errors: [Unit] Description=Check DNS working Wants=network-online.target After=network-online.target [Service] type=oneshot ExecStartPre=/usr/sbin/ip addr ExecStartPre=-/usr/bin/cat /etc/resolv.conf ExecStart=-/usr/bin/host fender Actual results: $ systemctl status check-dns-working.service ● check-dns-working.service - Check DNS working Loaded: loaded (/etc/systemd/system/check-dns-working.service; enabled; vendor preset: disabled) Active: inactive (dead) since Sun 2017-08-27 12:55:17 BST; 36min ago Process: 1237 ExecStart=/usr/bin/host fender (code=exited, status=1/FAILURE) Process: 1236 ExecStartPre=/usr/bin/cat /etc/resolv.conf (code=exited, status=1/FAILURE) Process: 1231 ExecStartPre=/usr/sbin/ip addr (code=exited, status=0/SUCCESS) Main PID: 1237 (code=exited, status=1/FAILURE) Aug 27 12:55:04 varric.chelsea.private ip[1231]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Aug 27 12:55:04 varric.chelsea.private ip[1231]: inet 127.0.0.1/8 scope host lo Aug 27 12:55:04 varric.chelsea.private ip[1231]: valid_lft forever preferred_lft forever Aug 27 12:55:04 varric.chelsea.private ip[1231]: inet6 ::1/128 scope host Aug 27 12:55:04 varric.chelsea.private ip[1231]: valid_lft forever preferred_lft forever Aug 27 12:55:04 varric.chelsea.private systemd[1]: Started Check DNS working. Aug 27 12:55:04 varric.chelsea.private ip[1231]: 2: enp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default Aug 27 12:55:04 varric.chelsea.private ip[1231]: link/ether 14:dd:a9:dc:52:da brd ff:ff:ff:ff:ff:ff Aug 27 12:55:04 varric.chelsea.private cat[1236]: /usr/bin/cat: /etc/resolv.conf: No such file or directory Aug 27 12:55:17 varric.chelsea.private host[1237]: ;; connection timed out; no servers could be reached Expected results: ip addr and host commands work. and nfs mount can then succeed. Additional info:
> The -s says wait for NetworkManager to be running. It does not > wait for any network interface to be usable. this is not true. `man nm-online`: -s | --wait-for-startup Wait for NetworkManager startup to complete, rather than waiting for network connectivity specifically. Startup is considered complete once NetworkManager has activated (or attempted to activate) every auto-activate connection which is available given the current network state. (This is generally only useful at boot time; after startup has completed, nm-online -s will just return immediately, regardless of the current network state.) See https://bugzilla.redhat.com/show_bug.cgi?id=1483343#c3
We also have a serious problems with this behaviour. As you quoted: "(or attempted to activate)" which means as described in the refered bug (https://bugzilla.redhat.com/show_bug.cgi?id=1483343#c3) this is after 5s. We have some new workstations with a 5GBASE-T Network adapter connect to a 1GBASE-T Switch. Obviosly the negotiation takes some time. Since the system boots from a M.2 NVME everything else is very fast. I have several NFS4 mounts in the fstab like: # admin utility mount adminutil:/adminutil /root/adminutil nfs4 defaults,auto,_netdev 0 0 # home dirs sr-nethomes:/nethomes /rakete/home/ldap nfs4 defaults,auto,_netdev 0 0 # tools mount sr-tools:/tools /rakete/tools nfs4 defaults,_netdev,auto,lookupcache=positive 0 0 What I tried to express with "auto" is: I need these mounts. Really. I would even have expected that something like NetworkManager-wait-online.service is enabled automatically since the whole remote-fs.pre target is absolutely useless without. Even with the NetworkManager-wait-online.service activated manually I have the problem that the Workstation is useless one out of - lets say - 4 boot attempts. But quite fast I have to admit. Useless - but fast. I clearly see in the journal the default timeout of 5 seconds. And then the nfs mounts fail and things go down the drain: Mar 06 14:49:59 almach.pma.lan systemd[1]: Starting Network Manager Wait Online... : Mar 06 14:49:59 almach.pma.lan NetworkManager[1230]: <info> [1520344199.7260] manager: (eno1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2) Mar 06 14:49:59 almach.pma.lan NetworkManager[1230]: <info> [1520344199.7269] device (eno1): state change: unmanaged -> unavailable (reason 'managed') [10 20 2] Mar 06 14:49:59 almach.pma.lan kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready Mar 06 14:49:59 almach.pma.lan kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready : Mar 06 14:50:04 almach.pma.lan NetworkManager[1230]: <info> [1520344204.6456] manager: startup complete Mar 06 14:50:04 almach.pma.lan systemd[1]: Started Network Manager Wait Online. Mar 06 14:50:04 almach.pma.lan systemd[1]: Starting LSB: Bring up/down networking... Mar 06 14:50:04 almach.pma.lan network[1473]: Bringing up loopback interface: [ OK ] Mar 06 14:50:04 almach.pma.lan NetworkManager[1230]: <info> [1520344204.9976] audit: op="connection-activate" uuid="51e24d69-bace-47ba-807f-f5d3f314bd25" name="eno1" result="fail" reason="No suitable device found for this connection." Mar 06 14:50:04 almach.pma.lan network[1473]: Bringing up interface eno1: Error: Connection activation failed: No suitable device found for this connection. Mar 06 14:50:05 almach.pma.lan network[1473]: [FAILED] : Mar 06 14:50:05 almach.pma.lan mount[1646]: mount.nfs4: Failed to resolve server sr-nethomes: Name or service not known Mar 06 14:50:05 almach.pma.lan mount[1646]: mount.nfs4: Operation already in progress : Mar 06 14:50:05 almach.pma.lan NetworkManager[1230]: <info> [1520344205.7534] device (eno1): link connected Mar 06 14:50:05 almach.pma.lan kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready Mar 06 14:50:05 almach.pma.lan kernel: warning: `NetworkManager' uses legacy ethtool link settings API, link modes are only partially reported Mar 06 14:50:05 almach.pma.lan NetworkManager[1230]: <info> [1520344205.7538] device (eno1): state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40] Mar 06 14:50:05 almach.pma.lan NetworkManager[1230]: <info> [1520344205.7546] policy: auto-activating connection 'eno1' If I deploy my own copy of the Unit file with the remove '-s' switch everything works fine. If you say NetworkManager-wait-online.service is working as "intended" I am a little desperate how to force systemd to wait until I have a working network connection with provided tools. Of course I can poke around systemd units files... But this unit file is called NetworkManager-wait-online.service and not NetworkManager-wait-max5s-and-then-maybe-online.service. There are lots of reason a system needs to wait until we have a working network connection. If this takes 6 hours I will think about changing the hardware, configuration what ever. But systemd should wait. Forever until proceeding with network stuff. Especially since this service is by default off.
The interface has no carrier for more then five seconds. NetworkManager doesn't assume that something still will happen, and declares that startup is complete. When carrier comes later, it doesn't matter, because NM-w-o is already complete. Since 1.10 you can configure the wait-time for carrier per-device, see https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=b595a80977193c7dd2a79ab5bd3caaa28bb88252 You can replace NetworkManager-wait-online.service with any service of your choosing to block network-online.target. For example a shell script that polls `nmcli general status`. Or even use "nm-online" without -s option, if that works for you. NetworkManager-wait-online.service is a very simple hammer. It cannot be perfect, nor suitable for everbody. But the -s option is precisely there for NM-w-o. If it doesn't work, it should be fixed (for example by making the carrier-wait-timeout configurable, which was done in newer versions). But saying that the -s option is wrong for NM-w-o is not correct. Whether NM-wait-online is enabled by default depends on the systemd presets. Most systems don't need this, that is why it's disabled by default. It's intended to be there for you to enable as a simple solution for a particular problem. But the default configuration cannot be optimal out of the box for every user. I would close this as NEXTRELEASE, because we are not upgrading Fedora 27 or older to 1.10, and because we probably won't backport the configuration option. Opinions welcome.
I perfectly understand how it works and how to fix it by my self. What I do not understand is the purpose of the NM-wait-online service is if it does not wait if NM is online. It dosent even fail if NM is not online! So the status of the NM is random after NM-wait-online. Actualy I am not intrested how NM-wait-online checks if NM is online. But after a successful run of NM-wait-online NM has to be online. As documented here: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ "If you use NetworkManager you can do this by enabling NetworkManager-wait-online.service: systemctl enable NetworkManager-wait-online.service" : : "This will ensure that all configured network devices are up and have an IP address assigned before boot continues. This service will time out after 90s. Enabling this service might considerably delay your boot even if the timeout is not reached. Both services are disabled by default." In the current state this does NOT work.
> What I do not understand is the purpose of the NM-wait-online service is if it > does not wait if NM is online. It dosent even fail if NM is not online! So the > status of the NM is random after NM-wait-online. The purpose of NM-wait-online is to delay network-online.target (and indirectly other units). NM-w-o completing, means that startup is complete. A bit like `udevadm settle`. Note that `udevadm settle` doesn't guarantee that all devices are discovered. Instead, it guarantees that udev is finished processing all currently found devices. NM-w-o means, NetworkManager did all initial activations to a point where no further actions are expected. Of course, this involves guessing, because NM can never know whether a second later something would happen that requires additional actions. In your case, NM determines that probably the cable is unplugged and declares startup as complete. In reality, the device takes so long to initialize. It's the problem of the 5 seconds timeout. A timeout cannot be perfect for the wide range of hardware. It's either too long (waiting needlessly long) or too short -- like in your case. The documentation you quote is not accurate (it's not NM documentation, fwiw). It's badly worded to claim that when NM-w-o completes, all interfaces will have an IP address (they might have failed to activate). If you boot the machine with cable unplugged, you obviously won't be online no matter how long you wait. But that won't delay boot any longer then it takes NM to determine that probably the cable is really unplugged and nothing is gonna happen. If you boot with cable unplugged, you don't want to wait 30 seconds. You wait 5 seconds until NM is convinced the cable is unplugged. You ask for a different meaning of NM-w-o. You are free to implement any kind of service that suits your expectation. NM-w-o isn't doing what you ask, but it does what makes sense in a lot of cases. In fact, you have the issue with the 5 seconds timeout waiting for carrier. If that timeout would be longer (it's configurable in 1.10+), then NM-w-o would work just fine for you. That doesn't mean, something is fundamentally wrong with how NM-w-o works.
Perhaps you are right and this is the correct way to do it. When you say "working as intended" what can I say... But I am quite sure that this is very confusing for a lot of admins and users. I do not see the harm in increasing the default time-out values to save values because it will not effect anybody in a negative way. And yes if I manually enable NM-w-o this means I want NM to wait so I could climb under the desk an plug in the cable. My english may be not perfect but obviously my understanding of "wait online" differs from yours. I am not talking about NetworkManager, I am only talking about the meaning of "wait online". I think you will need to explain your point of view to all who depend on a working network after boot time - as soon as they have trouble with it. In my personal opinion this is another point making systemd and NM even more complicated. When people read NM-wait-online.service they think they understands what it does, after reading your argumentation I am quite sure they do not. And they will only notice there misunderstanding when thinks break. So as a workaround to all people who have bought a 5GBase-T Interface and connected it to a 1GBase-T Switch port or just need a reliable "online status" cp /usr/lib/systemd/system/NetworkManager-wait-online.service /etc/systemd/system/NetworkManager-wait-online.service as stated in the first post removing the "-s" in NetworkManager-wait-online.service works fine. reload with systemctl daemon-reload /etc/systemd/system/NetworkManager-wait-online.service can be deployed on every workstation without any boot time impacts. Except you removed the lan cable.
what exactly is supposed to happen on this bug? do you request that the timeout of 5 seconds is increased? On newer versions it already increased to 6 seconds https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=156344b8beec88b68f335fe13c5db91d62fcb3fc and additionally it is made configurable (per device). I think there is nothing left to do, except, that this is not backported to Fedora 26. Is that what you request?
Best is to make the timeout a config option. Second best is to increase to say 30s.
This message is a reminder that Fedora 26 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '26'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 26 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.