Bug 837173

Summary: named-chroot.service fails to listen on primary interface because of bad script /etc/NetworkManager/dispatcher.d/13-named
Product: [Fedora] Fedora Reporter: Michael Fischer <fischer-michael>
Component: bindAssignee: Tomáš Hozza <thozza>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: alcol, atkac, gczarcinski, lpoetter, psimerda, thozza
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-23 03:28:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Fischer 2012-07-03 03:58:43 UTC
Description of problem:
named-chroot.service often fails to listen on the primary interface right after a reboot.  This is apparently because named-chroot.service starts before the interface is enabled, and bind only listens on enabled interfaces.

The script /etc/NetworkManager/dispatcher.d/13-named is supposed to fix this problem by reloading bind-chroot at a later point in the boot sequence, but the script itself is damaged.  It attempts to invoke a non-existent command "/sbin/systemctl", whereas the intended command is "/usr/bin/systemctl".

Version-Release number of selected component (if applicable):
bind-9.9.1-2.P1.fc17.x86_64


How reproducible:
Intermittant.  It usually fails, but it is timing dependent.


Steps to Reproduce:
1. Configure bind.
2. Enable named-chroot.service.
3. Reboot.
4. Type ss -l to see which interface(s) named is listening on.
  
Actual results:
LISTEN     0      3                          ::1:domain                       :::*       
LISTEN     0      3                    127.0.0.1:domain                        *:*       

Expected results:
LISTEN     0      3                 192.168.10.7:domain                        *:*       
LISTEN     0      3                          ::1:domain                       :::*       
LISTEN     0      3                    127.0.0.1:domain                        *:*       


Additional info:
Fix:  Change the four occurrences of "/sbin/systemctl" in /etc/NetworkManager/dispatcher.d/13-named to "/usr/bin/systemctl".

Comment 1 Alberto Colosi 2012-07-28 21:01:50 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=844047

is not a bind / named bug

but betworkmanager and or systemd

Comment 2 Michael Fischer 2012-07-29 02:26:33 UTC
The erroneous file, /etc/NetworkManager/dispatcher.d/13-named, is packaged in the bind-9.9.1-2.P1.fc17.x86_64 RPM.  Whoever maintains that package needs to fix the bug.

Comment 3 Alberto Colosi 2012-07-29 09:02:25 UTC
ah ok ! as from years I compile from myself what I need as ISC BIND, HTTPD and so on, I tought was not so as is not bind daemon itself to be faul

ok, now I know

Comment 4 Tomáš Hozza 2012-07-30 14:37:11 UTC
Added fix as Michael proposed. 

You can find build with the fix here http://koji.fedoraproject.org/koji/buildinfo?buildID=344959 before bind update will be available.

Comment 5 Adam Tkac 2012-07-30 15:35:32 UTC
*** Bug 844047 has been marked as a duplicate of this bug. ***

Comment 6 Gene Czarcinski 2012-08-09 17:07:29 UTC
Sorry about that but I do not believe that your fix will fix the basic problem.  I went in and manually changed the /etc/NetworkManager/dispatcher.d/13-named file so that /usr/bin/systemctl is used.

I can consistantly reproduce the problem ... just not on real hardware ... but as a virtualized system under qemu/kvm/etc.

The system is Fedora 17 with the current bind ... in fact, everything is current for fedora and updates.

The host is a AMD 6 core processor with 16GB memory and an SSD for root and home.

The guest is one virtual processor, 4GB virtual memory and 3 virtio NICs: one external with the ip address set by a dhcpd on a second virtual system, and two internal NICs with static addresses.  NetworkManager is used.  named, chronyd, and dhcpd are started at bootup.  The named and dhcpd onely serve the two internal NICs.

Consistantly during the bootup NetworkManager, dhcpd, and named are started more or less at the same time.  NetworkManager takes some time to complete initialization of all three NICs [some time is relative, the whole virtual guest bootup takes about 12 to 15 seconds].

During the bootup, dhcpd initializes but finds no interfaces.  However, after NetworkManager completes initialization, dhcpd rediscovers the NICs and everything is fine.

However, things do not fair as well with named.  Initialization of named begins and continues but finds no NICs (none at all).  So after bootup completes and I can login, I see that named is really screwed up.  Manually causing named to restart (using systemsctl), and it restarts with no errors and everything is now working as it should.

This is NOT good!!  I should be able to bring everything up "automatically".

I do not believe that this is going to be simple to fix.  A real fix is going to need upstream work.

There might be some quick fix [that is relatively quick] that could be done by Fedora folks which would involve being able to start named only after NetworkManager has actually completed all initialization.  I am not sure this can be done.

Comment 7 Michael Fischer 2012-08-09 18:51:46 UTC
I don't disagree with Gene.  As I understand it, the file /etc/NetworkManager/dispatcher.d/13-named is a workaround to try to compensate for the synchronization issues during startup introduced by systemd.  Both dhcpd and named assume that all relevant network interfaces are already up before they start.  The systemd scripts apparently fail to enforce this constraint, resulting in the observed behaviors where one or both of these services fail because they ran too early in the startup sequence.  The job of 13-named is just to restart named in case it failed the first time around.  The "right" solution would be to avoid the failure in the first place, or perhaps to restart named every time a new interface comes up.

My "fix" is not a fix to the underlying service startup issue.  It's only a fix to the workround that broke when systemd was moved from /sbin to /usr/bin.

Comment 8 Michael Fischer 2012-08-09 18:54:36 UTC
Oops.  I meant systemctl moved from /sbin to /usr/bin.

Comment 9 Pavel Šimerda (pavlix) 2012-08-12 11:38:48 UTC
> I don't disagree with Gene.  As I understand it, the file
> /etc/NetworkManager/dispatcher.d/13-named is a workaround to try to
> compensate for the synchronization issues during startup introduced by
> systemd.

This can just as well be related to bind's and NM's service files that are
there to enforce such criteria. But...


> Both dhcpd and named assume that all relevant network interfaces
> are already up before they start.

In my opinion this assumption is wrong. Interfaces can come and go, be enabled and disabled. Services that are expected to work through all interfaces should
either listen on a wildcard address, or, if they need separate sockets for separate interfaces, they should probably use netlink to learn about network
configuration changes.

I don't think any service should ever rely on network configuration being constant unless they are managed by another layer that manages the configuration changes for them.

> sequence.  The job of 13-named is just to restart named in case it failed
> the first time around.  The "right" solution would be to avoid the failure
> in the first place, or perhaps to restart named every time a new interface
> comes up.

s/the “right” solution/a better workaround/

Comment 10 Alberto Colosi 2012-08-12 12:16:39 UTC
I really disagree that processes could renew LISTENING on network change

if it was so, we was not having that kind of problem and need of 13-named

I'm last who can talk but I think that what I have just said is logical correct

Comment 11 Gene Czarcinski 2012-08-12 19:57:36 UTC
I agree that both named and dhcpd should not depend on the interfaces being up.  In fact, as I said, dhcpd does not and although it initially does not find the interfaces, it does when their intialization has been completed NetworkManager.

Unfortunately, the same is not true for named.  Although the right" solution may be that named change, I am not going to hold my breath.

I have submitted a bugzilla report against NetworkManager ... I do not know if this is a bug report or an RFE.  NetworkManager does "know" when it has completed initialization of the automatically started interfaces.  If it could then start named, that (IMHO) would be a better workaround.  Named may not be the only daemon that has this problem and for NetworkManager to handle those situation would (IMO) be a good thing.

See https://bugzilla.redhat.com/show_bug.cgi?id=847452

One last point, something that might be "a better workaround" would be to have rc.local restart named ... this might work ... then again it might not because this is a "race condition" and those can be unpredictable.

Comment 12 Pavel Šimerda (pavlix) 2012-08-13 13:35:19 UTC
> I really disagree that processes could renew LISTENING on network change

I'm not sure I understand you. If a daemon want's to work with individual interfaces and IP addresses, it's the daemon's responsibility to follow
kernel configuration changes.

> if it was so, we was not having that kind of problem and need of 13-named

Yes. If the daemon works correctly, we don't need any hacks and it can actually be started at just any convenient time no matter if it's before NM, after NM or when the network is fully configured.

Any dependency hacks will fail anyway when the configuration changes at runtime.

Comment 13 Pavel Šimerda (pavlix) 2012-08-13 13:53:44 UTC
> In fact, as I said, dhcpd does not and although it initially does not
> find the interfaces, it does when their intialization has been completed
> NetworkManager.

Yes.

> I have submitted a bugzilla report against NetworkManager ... I do not know
> if this is a bug report or an RFE.

  NetworkManager does "know" when it has
> completed initialization of the automatically started interfaces.  If it
> could then start named, that (IMHO) would be a better workaround.  Named may
> not be the only daemon that has this problem and for NetworkManager to
> handle those situation would (IMO) be a good thing.
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=847452

Replied. NetworkManager provides support for dispatching scripts. As you are saying, there may be other deamons with problems and NetworkManager should not just work around one by one but other packages should install scripts to work
around their own problems.

> One last point, something that might be "a better workaround" would be to
> have rc.local restart named ... this might work ... then again it might not
> because this is a "race condition" and those can be unpredictable.

Fixing the daemons or using dispatcher scripts to restart them is the way to go. And this is exactly how this was solved before.

It broke because of an incompatible change in the systemd package (systemctl moved from sbin to bin) that reqires all packages that can call systemctl by its full path in scripts to adapt. That's all.

Comment 14 Lennart Poettering 2012-08-17 02:30:45 UTC
systemctl has never been in sbin/, always has been in bin/.

Comment 15 Pavel Šimerda (pavlix) 2012-08-17 13:16:52 UTC
(In reply to comment #14)
> systemctl has never been in sbin/, always has been in bin/.

Thaks for the information and sorry for misattributing the problem.

Comment 16 Fedora Update System 2012-09-13 15:49:29 UTC
bind-9.9.1-9.P3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/bind-9.9.1-9.P3.fc17

Comment 17 Fedora Update System 2012-09-17 17:39:08 UTC
Package bind-9.9.1-9.P3.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing bind-9.9.1-9.P3.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-14106/bind-9.9.1-9.P3.fc17
then log in and leave karma (feedback).

Comment 18 Fedora Update System 2012-09-23 03:28:09 UTC
bind-9.9.1-9.P3.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.