Bug 847452 - named and dhcpd startup after NIC initialized
named and dhcpd startup after NIC initialized
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: NetworkManager (Show other bugs)
17
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Dan Williams
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-11 10:58 EDT by Gene Czarcinski
Modified: 2012-08-17 21:42 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-13 10:07:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gene Czarcinski 2012-08-11 10:58:38 EDT
Description of problem:
First of all, although I am reporting this as a bug, this may well be a request for enhancement.

This can be considered as related to this bug report -- https://bugzilla.redhat.com/show_bug.cgi?id=837173

The problem is that with systemd, daemons are started more or less in parallel.  Specifically, NetworkManager, named and dhcpd are started at the same time (there may be others that have this problem too).  After starting, NetworkManager take some time to get all defined NICs configured and started.  During this startup time, both named and dhcpd have started and are looking for their NICs.

Now with dhcpd, it does not find the interfaces at first but then a little time later tlooks again, finds them, completes initialization for that interface , and everything is OK.

Unfortunately, named is completely hosed until it is restarted.  Looking at /var/log/messages there are a large number of error messages issued by named.  In fact, there are so many that the logging system considers it a flood and stops recording all of the error messages.


Version-Release number of selected component (if applicable):
Fedora 17

How reproducible:
On real hardware, I have been unable to reproduce the problem [but this does not mean that someone could not have the problem on real hardware.]

On kvm/qemu/etc. with a virtual guest, I can reliably reproduce it

Steps to Reproduce:

Set up a virtual guest running Fedora 17 with at least three network interfaces.  In my configuration, one was and "external" interface assigned an address by another dhcpd server.  The other two network interfaces used static IPs which connected to private internale networks.

Amoung other things, this virtual guest system runs chronyd, named, and dhcpd to serve the networks on the two private network interface.  Any name resolution for the virtual guest itself is done upstream via the external interface.
  
Actual results:
named is completely hosed until restarted

Expected results:
after bootup completes, all servers functioning properly ... other virtual guests on one of the private networks can get name resulution.

Additional info:

1. One possible temporary fix might be to add something to rc.local to restart named ... hopefully, this will occur after NetworkManager has all of the interfaces initialized.

2. Forget the automated stuff, manually start named and dhcpd when you login after the system is up.  Maybe you even want to manually start the network interfaces.  [ugh!]

3. I consider this is a systemd problem since they do not take depenedancies into consideration.  However, looking at the bugzilla reports against systemd, they will not change.  In fact, they consider that named (bind) should change ... go do something up a rope ... strong message to follow.

4. The only program that is going to "know" when the initialization of all network interfaces is complete is NetworkManager itself.

My request: add code to NetworkManager to start (restart?) named, dhcpd, and perhaps others after interface initialization is complete.

This report may belong upstream in the gnome bugzilla.  I do not know.  All of my experience is with Fedora 17 and the pacakges it has.
Comment 1 Pavel Šimerda (pavlix) 2012-08-13 09:45:32 EDT
This is a problem in bind. What you are requesting is to let NetworkManager work around a bug in bind.

> My request: add code to NetworkManager to start (restart?) named, dhcpd, and
> perhaps others after interface initialization is complete.

NetworkManager should definitely *not* include code to start/restart systemd-managed services. But it has support for dispatcher.d scripts that allows you
to add workarounds to the particular packages.

Is there anything you miss in dispatcher.d support? Otherwise I'll be closing this bug report.
Comment 2 Pavel Šimerda (pavlix) 2012-08-13 10:07:55 EDT
I looked once again at the solution proposed in bug 837173 and it looks like it's something that always worked and the only problem was with systemd's path change. Closing.
Comment 3 Gene Czarcinski 2012-08-13 13:29:43 EDT
Sorry, but the solution proposed for bug 837173 fixes nothing!  Please read my comments in 837173
Comment 4 Dan Williams 2012-08-13 13:57:35 EDT
If there are dependencies between things here, there are two approaches:

1) make the service that depends on networking more aware of it's real dependencies.  That means, if named is configured to listen on some specific interface, or some specific IP address, then the  named service should really be waiting for that address or interface to come up, not just "networking" to start.  But that entails modifying services and their startup scripts.

2) enable the "NetworkManager-wait-online.service" systemd service, which will block startup until the earlier of (1) at least one network interface has started, and (2) 30 seconds have passed.

NetworkManager provides the networking service, but in systemd land that just means networking has started, that doesn't mean that any interface has an IP address yet.  That's what NM-wait-online is for.
Comment 5 Pavel Šimerda (pavlix) 2012-08-13 14:10:55 EDT
And as (2) will not restart services on network configuration changes,
you have to:

2b) Put a restarer into /etc/NetworkManager/dispatcher.d
Comment 6 Pavel Šimerda (pavlix) 2012-08-13 14:18:13 EDT
> Sorry, but the solution proposed for bug 837173 fixes nothing!  Please read
> my comments in 837173

You have opened a bug report on NetworkManager. I believe NetworkManager provides all necessary tools for other packages to very easily work around
their limitations with a simple script dropped in /etc/NetworkManager/dispatcher.d.

Dan's comment describes a way to delay starting of services until NM has acquired his first connection.

The NetworkManager package won't carry a list of services that can't cope with network changes. The maintainers of these services currently bundle scripts for dispatcher.d and that do exactly what you requested.

If there is anything else needed in NetworkManager to support these use cases, please let us know.
Comment 7 Gene Czarcinski 2012-08-17 15:33:31 EDT
This is a useless "fight" and I accept that the problem is in bind.

Personally, I gave up on bind/named and replaced bind/dhcpd with dnsmasq which is smart enought to monitor the intefaces and recover when their initialization is complete.

Those who argued that named should be "fixed" are correct.  I guess I was looking for a "quick fix" but that just was not a reasonable thing to do.

NetworkManager should go and continue doing the good work such as support for bridged interfaces. ;)
Comment 8 Pavel Šimerda (pavlix) 2012-08-17 21:42:16 EDT
> Those who argued that named should be "fixed" are correct.  I guess I was
> looking for a "quick fix" but that just was not a reasonable thing to do.

The original bug report was about a problem with the quick fix. It didn't work because of wrong systemctl path.

> NetworkManager should go and continue doing the good work such as support
> for bridged interfaces. ;)

On the way :).

Note You need to log in before you can comment on or make changes to this bug.