Bug 1119787 (network-online.target) - [tracker] Use network-online.target instead of network.target
Summary: [tracker] Use network-online.target instead of network.target
Keywords:
Status: ASSIGNED
Alias: network-online.target
Product: Fedora
Classification: Fedora
Component: distribution
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pavel (pavlix) Šimerda
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: hadoop-hdfs_network-online dhcp_network-online 782042 1096081 postfix_network-online 1117086 1117450 1119814 1119815 1119818 1120656 1352214 1373495 sendmail_network-online httpd_network-online 1506803 1506805 postfix_network-online.target
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-15 13:55 UTC by Matthew Miller
Modified: 2017-11-09 02:08 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Matthew Miller 2014-07-15 13:55:20 UTC
From Pavel Šimerda and Corinna Vinschen (combined messages, slightly paraphrased):


The underlying problem is how the network and network-online targets are
defined.  Basically (pardon fuzzy choice of words, please),
network.target makes sure the network subsystem is started, but it doesn
not make sure that all boottime-enabled interfaces are up, while
network-online.target makes sure that all boottime-enabled interfaces
are up.

For this scenario we have to take into account that there are four types
of network services:

1. Network services which always listen to 0.0.0.0 and/or ::.

2. Network services which potentially listen on explicit IP addresses,
   but which support address changes via the Linux extensions like
   rtnetlink/IP_FREEBIND.

3. Network services which potentially listen on explicit IP addresses
   but which don't support these extensions.

4. Network services which *connect* to remote services (e.g. using getaddrinfo
   and connect library calls). The ntp service is an example.


Our problem here are the services of type 3.  Such services often
provide configuration files allowing to specify the listen addresses.
If the service is started before all the boottime network addresses
are available, they will simply fail.  This is especially worrysome
for services like sshd, but in my experience more services are affected
by this, in my case at least dovecot, named, postfix, radicale, sshd.

So, bottom line is, IMHO those services which allow to specify explicit
listen addresses, and which are not capable of dealing with the
situation that certain network addresses are not up when the service
starts, must depend on network-online.target rather than just
network.target.

If services can be improved/patched (ideally upstream) to use 
rtnetlink/IP_FREEBIND, that's even better.

Basically they should pull in the network-online.target as a dependency
and start after it. None of the two should be forgotten.

[Unit]
Wants=network-online.target
After=network-online.target

Comment 1 Pavel Šimerda (pavlix) 2014-07-17 14:24:44 UTC
It would be good to determine whether it's better to use Wants= or Requires= and maybe try to be consistent. Techically it doesn't seem to make a difference, as services that need to finish *before* network-online.target either use WantedBy= in the Install section, or create the symlink in the `.wants` directory directly.

For example NetworkManager (both Fedora and upstream git) now seems to contain the following symlink among packaged files:

/usr/lib/systemd/system/network-online.target.wants/NetworkManager-wait-online.service

Therefore whether a service Requires or Wants network-online.target, the network-online.target is typically always avilable and always successful. Philosophically, I think Wants is the right one as conceptually network-online.target *might* fail and the applications just want to delay the start, not start conditionally.

Comment 2 Pavel Šimerda (pavlix) 2014-07-23 12:33:08 UTC
It might be still worth removing the network-online patch from the Fedora 20 branch of systemd to fix the regression until all units in Fedora are fixed.

Comment 3 Matthew Miller 2014-07-23 14:48:30 UTC
(In reply to Pavel Šimerda (pavlix) from comment #2)
> It might be still worth removing the network-online patch from the Fedora 20
> branch of systemd to fix the regression until all units in Fedora are fixed.

And although it's hard to believe :) we are getting close to test releases for Fedora 21, and I'd really like to ship that in a functional state.

Comment 4 Pavel Šimerda (pavlix) 2014-07-28 07:50:26 UTC
(In reply to Matthew Miller from comment #3)
> And although it's hard to believe :) we are getting close to test releases
> for Fedora 21, and I'd really like to ship that in a functional state.

I don't see this as an argument for reverting the change in F21, though, for the following reasons.

1) The services need to be fixed anyway in order to work properly with NetworkManager, which is the default network configuration solution in F21.

2) The positive part of the change allows custom software with initscripts to order itself after network-online.target which is by default represented with NetworkManager.

Therefore, in my opinion, it's critical to fix the services and it's important to keep the change for F21. It would be nice to fix the bug introduced by the change but with services fixed, it's not critical.

For the noncritical fix, I found another solution...

[Unit]
Description=good ol' network setup script
DefaultDependencies=no
After=local-fs.target
Before=sysinit.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/etc/rc.d/init.d/network start
ExecStop=/etc/rc.d/init.d/network stop
[Install]
WantedBy=sysinit.target

It was suggested by Radek Hladík in an internet discussion. It implements the network.service that runs the network initscript but adds the necessary ordering
directives.

Comment 5 Pavel Šimerda (pavlix) 2014-07-28 07:53:03 UTC
(In reply to Pavel Šimerda (pavlix) from comment #4)
> [Unit]
> Description=good ol' network setup script
> DefaultDependencies=no
> After=local-fs.target
> Before=sysinit.target
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/etc/rc.d/init.d/network start
> ExecStop=/etc/rc.d/init.d/network stop
> [Install]
> WantedBy=sysinit.target
> 
> It was suggested by Radek Hladík in an internet discussion. It implements
> the network.service that runs the network initscript but adds the necessary
> ordering directives.

I posted the contents as it was in the discussion. It may need a bit of care like ordering before network.target instead of sysinit.target and stuff like that.

Comment 6 Lukáš Nykrýn 2014-07-29 16:29:05 UTC
I think that there is one thing broken. If you have After=network.target in a service, it should be guaranteed that init will terminate you before network is down during shutdown. But currently network.service is Before=network-online.target so the network could be put down before network.target is finished.

I have posted a patch for that to systemd upstream mailing list.
http://lists.freedesktop.org/archives/systemd-devel/2014-July/021495.html

Comment 7 Matthew Miller 2014-07-29 17:06:54 UTC
(In reply to Pavel Šimerda (pavlix) from comment #4)
> (In reply to Matthew Miller from comment #3)
> > And although it's hard to believe :) we are getting close to test releases
> > for Fedora 21, and I'd really like to ship that in a functional state.
> 
> I don't see this as an argument for reverting the change in F21, though, for
> the following reasons.
> 
> 1) The services need to be fixed anyway in order to work properly with
> NetworkManager, which is the default network configuration solution in F21.
> 
> 2) The positive part of the change allows custom software with initscripts
> to order itself after network-online.target which is by default represented
> with NetworkManager.
> 
> Therefore, in my opinion, it's critical to fix the services and it's
> important to keep the change for F21. It would be nice to fix the bug
> introduced by the change but with services fixed, it's not critical.

Yeah, works for me as long as we can identify and fix all of the services (and clearly document in the packaging guidelines how this needs to be -- probably a ticket https://fedorahosted.org/fpc/newticket)

Comment 8 Pavel Šimerda (pavlix) 2014-07-29 19:46:03 UTC
(In reply to Matthew Miller from comment #7)
> Yeah, works for me as long as we can identify and fix all of the services
> (and clearly document in the packaging guidelines how this needs to be --
> probably a ticket https://fedorahosted.org/fpc/newticket)

Not sure how much should packaging guidelines substitute upstream documentation which more or less exists in this case. I also believe that systemd service files should be part of upstream packages whenever possible. But feel free to start a ticket if you think it's the right way to handle it.

We indeed should identify services that still use network.target for ordering as well as any other services that don't comply.

Comment 9 Pavel Šimerda (pavlix) 2014-07-29 20:07:46 UTC
(In reply to Lukáš Nykrýn from comment #6)
> I think that there is one thing broken. If you have After=network.target in
> a service, it should be guaranteed that init will terminate you before
> network is down during shutdown.

The specific use case is confusing itself, as it's never clear whether this ordering is done by mistake or is intentional. Are there examples of services that actually need it?

> But currently network.service is
> Before=network-online.target so the network could be put down before
> network.target is finished.

So a service ordered after network-online.service is still safe, correct? Only services that only need the shutdown ordering are broken.

> I have posted a patch for that to systemd upstream mailing list.
> http://lists.freedesktop.org/archives/systemd-devel/2014-July/021495.html

What exactly does the patch do? I'm not familiar with the code. You're saying we should continue to guarantee that /etc/init.d/network stops after network.target. Shouldn't we also continue to guarantee that /etc/init.d/network starts before network.target (which is in turn started before network-online.target) and thus maintain backwards compatibility?

Comment 10 Lukáš Nykrýn 2014-07-30 07:06:25 UTC
> The specific use case is confusing itself, as it's never clear whether this
> ordering is done by mistake or is intentional. Are there examples of
> services that actually need it?
I don't know, but I can imagine that before the service is terminated it wants to do some finalization actions and it would be better if network is still up.

> 
> What exactly does the patch do? I'm not familiar with the code. You're
> saying we should continue to guarantee that /etc/init.d/network stops after
> network.target. Shouldn't we also continue to guarantee that
> /etc/init.d/network starts before network.target (which is in turn started
> before network-online.target) and thus maintain backwards compatibility?

The patch simply adds Before=network.target for services that have Provides: $network. So yes also during startup initscripts which provides network will be started before network.target.

Comment 11 Pavel Šimerda (pavlix) 2014-07-30 07:20:22 UTC
(In reply to Lukáš Nykrýn from comment #10)
> I don't know, but I can imagine that before the service is terminated it
> wants to do some finalization actions and it would be better if network is
> still up.

OK, let's say any sort of communication server may want to notify clients that it's going down with some specific message, not just sockets closed by the system.

> > What exactly does the patch do? I'm not familiar with the code. You're
> > saying we should continue to guarantee that /etc/init.d/network stops after
> > network.target. Shouldn't we also continue to guarantee that
> > /etc/init.d/network starts before network.target (which is in turn started
> > before network-online.target) and thus maintain backwards compatibility?
> 
> The patch simply adds Before=network.target for services that have Provides:
> $network.

Aha, that's the mechanics. Can we have this patch added to Fedora 20+?

> So yes also during startup initscripts which provides network will
> be started before network.target.

Is there any other result of "Provides: $network"? Because I would think it is good enough to just order such a service before network.target which is ordered before network-online.target and thus provides the original boot order plus what we expected from the $network=network-online.target change. Basically $network means different things when provided and when depended upon, which is ok, as it's just a hack anyway.

Comment 12 Lukáš Nykrýn 2014-07-30 08:22:51 UTC
> Aha, that's the mechanics. Can we have this patch added to Fedora 20+?
It is not in the upstream yet. I will wait fo little bit longer and then push it myself.
> 
> > So yes also during startup initscripts which provides network will
> > be started before network.target.
> 
> Is there any other result of "Provides: $network"? Because I would think it
> is good enough to just order such a service before network.target which is
> ordered before network-online.target and thus provides the original boot
> order plus what we expected from the $network=network-online.target change.
> Basically $network means different things when provided and when depended
> upon, which is ok, as it's just a hack anyway.

Provides: $network also means before network-online.target, but that is redundant.

Comment 13 Matthew Miller 2014-07-30 13:06:24 UTC
(In reply to Pavel Šimerda (pavlix) from comment #8)
> Not sure how much should packaging guidelines substitute upstream
> documentation which more or less exists in this case. I also believe that
> systemd service files should be part of upstream packages whenever possible.

Agreed. I think it'd be good to give guidance at https://fedoraproject.org/wiki/Packaging:Systemd, though — no big production and it certainly can point to the upstream documentation. The goal is to make it as easy as possible for packagers who aren't necessarily familiar with systemd.


> But feel free to start a ticket if you think it's the right way to handle it.

Sure -- do you have some suggested wording?

Comment 14 Pavel Šimerda (pavlix) 2014-07-30 14:51:22 UTC
(In reply to Matthew Miller from comment #13)
> Sure -- do you have some suggested wording?

I'll try to give a couple of points... but feel free to fix it. Or I can start a wiki page for staging the text if that's preferable.

1) Services that require to be started after network is fully configured should pull in network-online.target and order itself after it.

[Unit]
Wants=network-online.target
After=network-online.target

Note that pulling in network-online.target may extend the overall boot time.

2) Services that don't need to wait for network configuration but would should be stopped before network is taken down should order itself after network.target.

[Unit]
After=network.target

3) Services that don't have any of the requirements above should not reference those targets.

Comment 15 Pavel Šimerda (pavlix) 2014-07-30 14:55:34 UTC
Let's start with:

https://fedoraproject.org/wiki/Networking/Ideas/ServiceOrdering

Comment 16 Pavel Šimerda (pavlix) 2014-08-05 16:50:28 UTC
(In reply to Lukáš Nykrýn from comment #12)
> > Aha, that's the mechanics. Can we have this patch added to Fedora 20+?
> It is not in the upstream yet. I will wait fo little bit longer and then
> push it myself.

Any updates?

Comment 18 Pavel Šimerda (pavlix) 2014-08-13 07:20:47 UTC
(In reply to Lukáš Nykrýn from comment #17)
> It's in upstream
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=805b573fad06b845502e76f3db3a0efa7583149d

Great, what about Fedora >= 20?

Comment 19 Jason Tibbitts 2014-10-14 03:46:33 UTC
Does anyone have a list of services which are still broken?  I'm having issues with autofs which I think stem from this, but I'm not sure what to do to confirm that.

Comment 20 Pavel Šimerda (pavlix) 2015-11-03 12:50:38 UTC
(In reply to Jason Tibbitts from comment #19)
> Does anyone have a list of services which are still broken?  I'm having
> issues with autofs which I think stem from this, but I'm not sure what to do
> to confirm that.

We do not have a list (apart from blocking bugs) and it is not trivial to detect wrong packages because After=network.target is valid for services that do not need to start after configured network but still benefit from being stopped before network is torn down.

Comment 21 Philip Prindeville 2017-02-02 22:26:36 UTC
(In reply to Pavel Šimerda (pavlix) from comment #20)
> We do not have a list (apart from blocking bugs) and it is not trivial to
> detect wrong packages because After=network.target is valid for services
> that do not need to start after configured network but still benefit from
> being stopped before network is torn down.

Well, sendmail.service, httpd.service, and proftpd.service interpolate the value of hostname (or hostname -f) into their configuration... so on DHCP-based hosts you'd want to wait until hostname is set to something sane.

I've noticed empirically that apcupsd.service fails if it's configured to use SNMP to poll the UPS and there's no valid route (i.e. the network is down).  That's probably a bug in apcupsd and it should be trying a lot harder, but currently it doesn't.

Comment 22 Philip Prindeville 2017-02-03 00:00:05 UTC
This bug has been open a while... can we get some forward momentum on it?

Comment 23 Nathan G. Grennan 2017-04-09 18:49:29 UTC
I just hit this with a fresh install of Fedora 25. I am running samba attached to eth1, an internal interface. Previously I was using network instead of NetworkManager.

It is crazy that this has been open since 2014. Though I suspect it would require a manager or product manager to care enough. Since it would require changing the .service files in many different packages.

Comment 24 Matthew Miller 2017-04-10 15:11:02 UTC
(In reply to Nathan G. Grennan from comment #23)
> I just hit this with a fresh install of Fedora 25. I am running samba
> attached to eth1, an internal interface. Previously I was using network
> instead of NetworkManager.
> 
> It is crazy that this has been open since 2014. Though I suspect it would
> require a manager or product manager to care enough. Since it would require
> changing the .service files in many different packages.

This is a tracker bug for issues as they are found. There isn't any specific action on this bug itself -- if you find a problem, please file a new bug and mark that as blocking this one.


However:

a) we are using unpatched upstream systemd configuration for samba. We *can* make a fix locally, but we prefer not to. 

b) it's not always, actually, a problem. Many services will start before network is online and work just fine; it's usually _better_ to fix that where possible than to change the targets.

Comment 25 Philip Prindeville 2017-10-26 18:31:46 UTC
(In reply to Jason Tibbitts from comment #19)
> Does anyone have a list of services which are still broken?  I'm having
> issues with autofs which I think stem from this, but I'm not sure what to do
> to confirm that.

Services I have to restart following a change in network parameters are:

rsyslog (hostname)
spamassassin (hostname)
cyrus-imapd (hostname)
sendmail (hostname)
mimedefang (hostname)
apcupsd (hostname)
proftpd (hostname)
httpd (hostname)

Comment 26 Jason Tibbitts 2017-10-26 19:58:52 UTC
Having to restart following a change in network parameters might, or might not, be related to use of network-online.target.  At least one of the tickets you filed was against a package which already uses After=network-online.target.  I that at least in some cases you are seeing a different issue, where either network-online still isn't late enough or something is racing.

And note that whether a daemon can adapt to a change of IP is rather different than handling a change in hostname.  If something is binding to 0.0.0.0 or already supports IP_FREEBIND then just adding after=network-online.target isn't going to make any difference to your issue.

Comment 27 Philip Prindeville 2017-10-26 20:10:10 UTC
(In reply to Jason Tibbitts from comment #26)
> Having to restart following a change in network parameters might, or might
> not, be related to use of network-online.target.  At least one of the
> tickets you filed was against a package which already uses
> After=network-online.target.  I that at least in some cases you are seeing a
> different issue, where either network-online still isn't late enough or
> something is racing.
> 
> And note that whether a daemon can adapt to a change of IP is rather
> different than handling a change in hostname.  If something is binding to
> 0.0.0.0 or already supports IP_FREEBIND then just adding
> after=network-online.target isn't going to make any difference to your issue.

Alas there's no way to poll for a change in hostname asynchronously: you'd have to call gethostbyname() each time through your loop which might be a little painful (even though as system calls go, it's relatively lightweight).

But a change in your IP address (which can be notified asynchronously) often portends a change in your hostname as well.

Comment 28 Michal Ambroz 2017-11-09 01:57:18 UTC
Part of the problem is also that the unit files such as /usr/lib/systemd/system/sendmail.service are usually not marked as configuration files so even if you change it manually from network.target to netork-online.target, it will revert back with another patching.

Comment 29 Jason Tibbitts 2017-11-09 02:08:05 UTC
But there's never any need to edit the unit files under /lib/systemd/system....

Either copy the sendmail.service file to /etc/systemd/system and edit it, as you wish, or create /etc/systemd/system/sendmail.d and add overrides there.  man systemd.unit, search for "drop-in" for more info on the latter method.


Note You need to log in before you can comment on or make changes to this bug.