Red Hat Bugzilla – Bug 847452
named and dhcpd startup after NIC initialized
Last modified: 2012-08-17 21:42:16 EDT
Description of problem:
First of all, although I am reporting this as a bug, this may well be a request for enhancement.
This can be considered as related to this bug report -- https://bugzilla.redhat.com/show_bug.cgi?id=837173
The problem is that with systemd, daemons are started more or less in parallel. Specifically, NetworkManager, named and dhcpd are started at the same time (there may be others that have this problem too). After starting, NetworkManager take some time to get all defined NICs configured and started. During this startup time, both named and dhcpd have started and are looking for their NICs.
Now with dhcpd, it does not find the interfaces at first but then a little time later tlooks again, finds them, completes initialization for that interface , and everything is OK.
Unfortunately, named is completely hosed until it is restarted. Looking at /var/log/messages there are a large number of error messages issued by named. In fact, there are so many that the logging system considers it a flood and stops recording all of the error messages.
Version-Release number of selected component (if applicable):
On real hardware, I have been unable to reproduce the problem [but this does not mean that someone could not have the problem on real hardware.]
On kvm/qemu/etc. with a virtual guest, I can reliably reproduce it
Steps to Reproduce:
Set up a virtual guest running Fedora 17 with at least three network interfaces. In my configuration, one was and "external" interface assigned an address by another dhcpd server. The other two network interfaces used static IPs which connected to private internale networks.
Amoung other things, this virtual guest system runs chronyd, named, and dhcpd to serve the networks on the two private network interface. Any name resolution for the virtual guest itself is done upstream via the external interface.
named is completely hosed until restarted
after bootup completes, all servers functioning properly ... other virtual guests on one of the private networks can get name resulution.
1. One possible temporary fix might be to add something to rc.local to restart named ... hopefully, this will occur after NetworkManager has all of the interfaces initialized.
2. Forget the automated stuff, manually start named and dhcpd when you login after the system is up. Maybe you even want to manually start the network interfaces. [ugh!]
3. I consider this is a systemd problem since they do not take depenedancies into consideration. However, looking at the bugzilla reports against systemd, they will not change. In fact, they consider that named (bind) should change ... go do something up a rope ... strong message to follow.
4. The only program that is going to "know" when the initialization of all network interfaces is complete is NetworkManager itself.
My request: add code to NetworkManager to start (restart?) named, dhcpd, and perhaps others after interface initialization is complete.
This report may belong upstream in the gnome bugzilla. I do not know. All of my experience is with Fedora 17 and the pacakges it has.
This is a problem in bind. What you are requesting is to let NetworkManager work around a bug in bind.
> My request: add code to NetworkManager to start (restart?) named, dhcpd, and
> perhaps others after interface initialization is complete.
NetworkManager should definitely *not* include code to start/restart systemd-managed services. But it has support for dispatcher.d scripts that allows you
to add workarounds to the particular packages.
Is there anything you miss in dispatcher.d support? Otherwise I'll be closing this bug report.
I looked once again at the solution proposed in bug 837173 and it looks like it's something that always worked and the only problem was with systemd's path change. Closing.
Sorry, but the solution proposed for bug 837173 fixes nothing! Please read my comments in 837173
If there are dependencies between things here, there are two approaches:
1) make the service that depends on networking more aware of it's real dependencies. That means, if named is configured to listen on some specific interface, or some specific IP address, then the named service should really be waiting for that address or interface to come up, not just "networking" to start. But that entails modifying services and their startup scripts.
2) enable the "NetworkManager-wait-online.service" systemd service, which will block startup until the earlier of (1) at least one network interface has started, and (2) 30 seconds have passed.
NetworkManager provides the networking service, but in systemd land that just means networking has started, that doesn't mean that any interface has an IP address yet. That's what NM-wait-online is for.
And as (2) will not restart services on network configuration changes,
you have to:
2b) Put a restarer into /etc/NetworkManager/dispatcher.d
> Sorry, but the solution proposed for bug 837173 fixes nothing! Please read
> my comments in 837173
You have opened a bug report on NetworkManager. I believe NetworkManager provides all necessary tools for other packages to very easily work around
their limitations with a simple script dropped in /etc/NetworkManager/dispatcher.d.
Dan's comment describes a way to delay starting of services until NM has acquired his first connection.
The NetworkManager package won't carry a list of services that can't cope with network changes. The maintainers of these services currently bundle scripts for dispatcher.d and that do exactly what you requested.
If there is anything else needed in NetworkManager to support these use cases, please let us know.
This is a useless "fight" and I accept that the problem is in bind.
Personally, I gave up on bind/named and replaced bind/dhcpd with dnsmasq which is smart enought to monitor the intefaces and recover when their initialization is complete.
Those who argued that named should be "fixed" are correct. I guess I was looking for a "quick fix" but that just was not a reasonable thing to do.
NetworkManager should go and continue doing the good work such as support for bridged interfaces. ;)
> Those who argued that named should be "fixed" are correct. I guess I was
> looking for a "quick fix" but that just was not a reasonable thing to do.
The original bug report was about a problem with the quick fix. It didn't work because of wrong systemctl path.
> NetworkManager should go and continue doing the good work such as support
> for bridged interfaces. ;)
On the way :).