Bug 1455704 - NetworkManager-wait-online.service should not be enabled unconditionally
Summary: NetworkManager-wait-online.service should not be enabled unconditionally
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 26
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Thomas Haller
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1498943
TreeView+ depends on / blocked
 
Reported: 2017-05-25 21:31 UTC by Zbigniew Jędrzejewski-Szmek
Modified: 2017-12-12 16:53 UTC (History)
20 users (show)

Fixed In Version: NetworkManager-1.8.4-2.fc27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1498943 (view as bug list)
Environment:
Last Closed: 2017-10-04 14:20:14 UTC


Attachments (Terms of Use)
[patch] contrib/rpm: enable "NetworkManager-wait-online.service" on package upgrade (1.83 KB, patch)
2017-09-26 18:31 UTC, Thomas Haller
no flags Details | Diff

Description Zbigniew Jędrzejewski-Szmek 2017-05-25 21:31:18 UTC
Description of problem:
With a Fedora-26 cloud image, which has NetworkManager.service disabled, network-online.target is not reached properly. Just 'systemctl start network-online.target' doesn't actually work, because NetworkManager-wait-online.service has Requisite: NetworkManager.service , so it doesn't bring up the network, it just fails because the dependency isn't satisfied.

We need to figure out how to structure network-online.target and its deps in a way that works with multiple independent network configuration implementations, each of which might be enabled/disabled/uninstalled/masked/etc.

See https://bugzilla.redhat.com/show_bug.cgi?id=1452866#c22, https://bugzilla.redhat.com/show_bug.cgi?id=1452866#c23.

I'm marking this as blocking #1452866, to make it easy to find. If #1452866 is solved independently, we still need to figure out this one.

Comment 1 Adam Williamson 2017-05-25 22:09:28 UTC
note that systemd-networkd-wait-online.service also uses Requisite, and so is subject to the same issue.

Comment 2 Zbigniew Jędrzejewski-Szmek 2017-05-30 18:44:06 UTC
I misunderstood things here completely. network-online.target works just fine, because it has Wants=NetworkManager-wait-online.service systemd-networkd-wait-online.service, and if either of those services fails, it still waits for the other one.

There's a cosmetic bug, in that the cloud image has NetworkManager-wait-online.service enabled without having NetworkManager.service enabled. This is caused by /usr/lib/systemd/system/network-online.target.wants/NetworkManager-wait-online.service being packaged by NetworkManager-1.8.0-2.fc27.x86_64, i.e. enabling NetworkManager-wait-online.service unconditionally. It should not be enabled unconditionally for two reasons:
- to avoid the ugly "dependency failed" message at startup in NM.service is not enabled
- to allow the admin to turn NM-w-o.service on and off at will.

Thus, the proper thing is for NM-w-o.service to be enabled through presets and the %post script. The %post script already calls systemctl preset for NM-w-o.service. So what remains is to add NM-w-o.service to /usr/lib/systemd/system-preset/90-default.preset, and remove the symlink.

I'll reassign this to NetworkManager for the symlink removal.

Comment 5 Pavel (pavlix) Šimerda 2017-05-30 23:25:46 UTC
Long time ago I have created an answer on UNIX/Linux Stack Exchange regarding network-online.target:

https://unix.stackexchange.com/questions/126009/cause-a-script-to-execute-after-networking-has-started/126146#126146

If there are any changes to the network-waiting logic, I would like to update that to reflect the changes.

Now to the proposed changes. I think Lennart never thought through the idea of network-online.service properly and just wanted to quickly get something to drive away the criticism. The idea, as documented and used, is that those wait online services would all be triggered directly by network-online.service instead of being enabled/disabled by anyone (including presets) so that the following is met:

1) The wait-online service *always* waits for the respective network service to get fully configured (whatever that means for that particular service) when the respective service is running or scheduled to run.

2) The wait-online service exits quicky if the respective service is not running and not scheduled to run.

I believe you that the current ways aren't perfect but *please* be careful and *please* don't do the same mistake again and consider the consequences before making any changes to avoid even more confusion that we have so far.

Comment 7 Thomas Haller 2017-06-02 13:30:05 UTC
In my opinion, the analysis in comment 2 and proposed solution in comment 3 is correct, and we should do that.

AFAIU, currently, the NetworkManager-wait-online.service is always pulled in by network-online.target. The only way to avoid it, is masking NM-w-o. I hence think the preset solution is better as it makes `systemctl disable NM-w-o` work as expected.


I would just wait to change NetworkManager until the changes to preset hit F26 stable.

(anybody please ping/bump this bug once that happened -- since it's not tracked by another bugzilla bug).

Comment 8 Thomas Haller 2017-06-02 13:42:42 UTC
> I believe you that the current ways aren't perfect but *please* be careful and 
> *please* don't do the same mistake again and consider the consequences before
> making any changes to avoid even more confusion that we have so far.

NetworkManager-wait-online should probably be implemented the same as networkd-wait-online as the reasoning is the same. IMO we should change NM-w-o here (carefully).

Comment 9 Zbigniew Jędrzejewski-Szmek 2017-06-02 22:02:26 UTC
> I would just wait to change NetworkManager until the changes to preset hit F26 stable.

The PRs have been merged, but we need to wait for a fedora-release release.

Comment 10 Zbigniew Jędrzejewski-Szmek 2017-07-04 14:29:30 UTC
fedora-release was been released, and the same change has also been done in systemd. Please update NM.

Comment 11 Thomas Haller 2017-07-17 14:10:12 UTC
Fixed upstream:

master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d61eaf2545c819cab17c29c4c13a3a3bd94b38a6


this will automatically bubble to Fedora 27 once we rebase.


Fedora 26 needs a backport (if we want to do it there too). Moving bug to MODIFIED, as it only needs a Fedora patch.

Comment 12 Zbigniew Jędrzejewski-Szmek 2017-07-17 14:12:52 UTC
I think MODIFIED is used by bodhi when an update has been submitted, and POST when we have a patch upstream.

Comment 13 Pavel (pavlix) Šimerda 2017-07-17 19:44:00 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #12)
> I think MODIFIED is used by bodhi when an update has been submitted, and
> POST when we have a patch upstream.

You're right MODIFIED means modified in Fedora, not just upstream. But I think Bodhi sets MODIFIED for a short time before the update is processed and added to updates-testing, then it sets ON_QA automatically. At least this is how it used to work. :)

Comment 14 Zbigniew Jędrzejewski-Szmek 2017-07-17 22:43:23 UTC
Indeed, it seems to change to MODIFIED and then to ON_QA almost at the same time (notifications have the same timestamp, so within 1 s). So bugs in MODIFIED state are almost non-existent. So dunno, if you think MODIFIED is more appropriate, I won't argue.

I wish we could change to a more obvious set of states like NEW / PATCH_REVIEW / FIXED_UPSTREAM / IN_UPDATES / CLOSED. We certainly don't need ASSIGNED when there's assigned-to field, and the other states are cryptic.

Comment 15 Adam Williamson 2017-07-17 23:22:14 UTC
the RHBZ states unfortunately have to cover the workflows of several different products (Fedora, RHEL, and several others), and so are not perfect for any of them. The 'canonical' definition of what the states mean for Fedora is here:

https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow

but in practice it's not super strictly observed and there's no benefit to trying to 'enforce' it, it's more a case of guidelines.

Pavel is correct about how Bodhi uses the states, IIRC. It should go to MODIFIED as soon as an update is submitted, than ON_QA when it reaches updates-testing. The gap between those states will vary, it just depends when updates-testing pushes happen. And indeed it's fairly conventional to use POST when a fix is available in some sense (as an attachment, on a mailing list for review, known to be available for backporting upstream...) but not yet in any way applied to the Fedora package.

Comment 16 Fedora Update System 2017-09-20 12:37:05 UTC
NetworkManager-1.8.4-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c

Comment 17 Fedora Update System 2017-09-20 15:21:49 UTC
NetworkManager-1.8.4-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c

Comment 18 Thomas Haller 2017-09-26 18:31:58 UTC
Created attachment 1331219 [details]
[patch] contrib/rpm: enable "NetworkManager-wait-online.service" on package upgrade

This effectively disables NM-w-o.service on package upgrade.

Attached patch is supposed to fix that.

Due to this issue, I also unpushed https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c


Please review.

Comment 19 Thomas Haller 2017-09-27 16:06:10 UTC
(In reply to Thomas Haller from comment #18)
> Created attachment 1331219 [details]
> [patch] contrib/rpm: enable "NetworkManager-wait-online.service" on package
> upgrade

merged upstream: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=513d0c2286c8cfe3618cf767da1511bcb939640a

Comment 20 Fedora Update System 2017-09-27 17:03:06 UTC
NetworkManager-1.8.4-2.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c

Comment 21 R P Herrold 2017-09-27 22:51:10 UTC
The prior behaviour is of course still present in CentOS / RHEL 7 current, and will not 'bubble' in by itself.  leaving a breadcrumb for those researching

Comment 22 Fedora Update System 2017-09-28 16:28:04 UTC
NetworkManager-1.8.4-2.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c

Comment 23 Lukas Slebodnik 2017-10-02 18:18:28 UTC
(In reply to Fedora Update System from comment #22)
> NetworkManager-1.8.4-2.fc27 has been pushed to the Fedora 27 testing
> repository. If problems still persist, please make note of it in this bug
> report.
> See https://fedoraproject.org/wiki/QA:Updates_Testing for
> instructions on how to install test updates.
> You can provide feedback for this update here:
> https://bodhi.fedoraproject.org/updates/FEDORA-2017-6eb0802a1c

Stephen,

IIRC from another unrelated discussion you mentioned that it is against fedora packaging guidelines to enable services in scriptlets

Could you check whether following change is correct?
https://src.fedoraproject.org/rpms/NetworkManager/c/de4dee240505a17bf877a943b74f939d488c08ac?branch=master

Comment 24 Stephen Gallagher 2017-10-02 18:32:19 UTC
OK, so as I see it, the problem is that the behavior is this:

Previously, NM was (incorrectly) dropping a symlink into the network-online.target.wants directory, which caused it to be functionally equivalent to having added Wants=NetworkManager-wait-online.service to network-online.target, but it did so by bypassing the appropriate mechanisms.

So NetworkManager-wait-online.service is now WantedBy=network-online.target, but because the RPM update is deleting that file, the script is calling `systemctl enable NetworkManager-wait-online.service` explicitly, which is against policy.

I *think* the best course of action here would actually be to just put
%ghost %dir %{systemd_dir}/network-online.target.wants
%ghost %{systemd_dir}/network-online.target.wants/NetworkManager-wait-online.service

Into the NetworkManager specfile, so if it was previously present, it doesn't get removed by the RPM upgrade (but is no longer strictly owned by it, so `systemctl disable NetworkManager-wait-online.service` will work as expected. This will avoid the need to call systemctl in the %pre scriptlet.

Comment 25 Thomas Haller 2017-10-03 07:23:34 UTC
> removed by the RPM upgrade (but is no longer strictly owned by it, so 
> `systemctl disable NetworkManager-wait-online.service` will work as expected.

I don't think it would work like expected. The problem with leaving %{systemd_dir}/network-online.target.wants/NetworkManager-wait-online.service, is that `systemctl disable NetworkManager-wait-online` has then no effect. That is because network-online.target always wants NM-w-o. In that case, the only way to prevent NM-w-o from running, is to mask it.

Comment 26 Stephen Gallagher 2017-10-03 11:49:09 UTC
Oh, right. Because you put it in /usr/lib/systemd/system instead of /etc/systemd/system where it would have belonged :(

OK, I suppose there really isn't a better solution to dealing with this than calling `systemctl enable` if the original link is present. Can we get this change backported to F26 though? That way we only need to carry this hack in the F26 and F27 releases and can drop it and behave correctly in Rawhide/F28+.

The reasoning is that upgrades are only supported from N-2 -> N, so F25 -> F27 is the longest jump we need to worry about right now, so as long as F25 to either F26 or F27 fixes this issue, we don't need to worry about upgrades to F28 and later.

Comment 27 Fedora Update System 2017-10-04 14:20:14 UTC
NetworkManager-1.8.4-2.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 28 Thomas Haller 2017-10-04 15:15:53 UTC
(In reply to Stephen Gallagher from comment #26)
> OK, I suppose there really isn't a better solution to dealing with this than
> calling `systemctl enable` if the original link is present. Can we get this
> change backported to F26 though? That way we only need to carry this hack in
> the F26 and F27 releases and can drop it and behave correctly in
> Rawhide/F28+.
> 
> The reasoning is that upgrades are only supported from N-2 -> N, so F25 ->
> F27 is the longest jump we need to worry about right now, so as long as F25
> to either F26 or F27 fixes this issue, we don't need to worry about upgrades
> to F28 and later.

The perceived ugliness of %pre containing a conditional(!) `systemctl enable` seems acceptable to me. I emphasize "conditional" here, as it is only in effect during upgrade from an older version.

IMO a better argument would be, if somebody needs the bug fixed in Fedora26. But there is a workaround (`systemctl mask NetworkManager-wait-online.service`), so there is not a strong argument for that either. The bug was present for years already, and nobody spoke up.

I'd rather not invest time into something that brings little benefit -- in my opinion.
I am not talking about my personal time. I am talking about hundreds(?) of testers and tens of thousands(?) of users.


If somebody speaks up loud enough, we can do it :)

Comment 29 Stephen Gallagher 2017-10-05 14:48:07 UTC
So, my main concern in this case was that this would have unexpected interactions with ostree-based deployments. However, I just had a conversation about it with Colin Walters and this isn't an issue with this specific implementation.

That said, I'd still like to see it planned to have this removed once there are no supported Fedora installations that have ever had this file, if only for spec file cleanliness.


Note You need to log in before you can comment on or make changes to this bug.