Bug 169969 - named fails to start
named fails to start
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: bind (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Vas Dias
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-05 19:19 EDT by Matthew Saltzman
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version: bind-9.3.1-18
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-06 14:29:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Matthew Saltzman 2005-10-05 19:19:59 EDT
Description of problem:

named does not start

Version-Release number of selected component (if applicable):
bind-9.3.1-14

How reproducible:
always

Steps to Reproduce:
1. chkconfig named on
2. service named start
3.
  
Actual results:
[FAILED]


Expected results:
[OK]

Additional info:
RPM rebuilt from source in FC4.  Also have bind-libs, bind-utils, and
caching-nameserver-7.3-3 installed.

Also fails with selinux in permissive mode.

The following appears in /var/log/messages:

Oct  5 19:15:22 vincent named[3659]: starting BIND 9.3.1 -u named -D
Oct  5 19:15:22 vincent named[3659]: found 1 CPU, using 1 worker thread
Oct  5 19:15:22 vincent named[3659]: loading configuration from '/etc/named.conf'
Oct  5 19:15:22 vincent named[3659]: listening on IPv4 interface lo, 127.0.0.1#53
Oct  5 19:15:22 vincent named[3659]: listening on IPv4 interface eth0,
130.127.131.165#53
Oct  5 19:15:22 vincent named[3659]: command channel listening on 127.0.0.1#953
Oct  5 19:15:22 vincent named[3659]: zone 0.in-addr.arpa/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone 0.0.127.in-addr.arpa/IN: loaded serial
1997022700
Oct  5 19:15:22 vincent named[3659]: zone 255.in-addr.arpa/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN:
loaded serial 1997022700
Oct  5 19:15:22 vincent named[3659]: zone localdomain/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone localhost/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: task.c:1272: REQUIRE(task->state ==
task_state_running) failed
Oct  5 19:15:22 vincent named[3659]: exiting (due to assertion failure)
Oct  5 19:15:22 vincent NetworkManager: <WARNING>       (): Could not set
forwarders for zone '.'.  Error: 'Message did not receive a reply (timeout by
message bus)'.
Comment 1 Matthew Saltzman 2005-10-06 09:56:01 EDT
Removed *all* RPMs related to named, caching-nameserver, NetworkManager. 
Removed /var/named and all contents.  Reinstalled all of the above.  Now it's
working fine.

I'll leave this for Jason to close, as there may be a packaging bug of some sort
causing this problem on upgrades.
Comment 2 Jason Vas Dias 2005-10-06 14:29:07 EDT
Thanks for the detective work in reporting this issue, as it has exposed a bug:

Oct  5 19:15:22 vincent named[3659]: zone localhost/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: task.c:1272: REQUIRE(task->state ==
task_state_running) failed
Oct  5 19:15:22 vincent named[3659]: exiting (due to assertion failure)

These messages show that a task enters 'isc_task_beginexclusive(...)' 
BEFORE isc_app_run(...) is entered and named's tasks are properly
in the "running" state. 

Normally, you should see the messages:
    'named[xxxx]: D-BUS dhcdbd subscription enabled.'
    'named[xxxx]: D-BUS service enabled.'
    'named[xxxx]: running'
before any dbus messages are processed. Once named emits the 'running' message,
it has entered isc_app_run(...) and tasks are ready to handle events.

To handle the SetForwarders request, named must call 'isc_task_beginexclusive'
to ensure exclusive access to the configuration data for the server thread 
handling the request, which will correctly cause an Abort if the task is not
in the "task_state_running" state (the task manager event processing loop,
isc_app_run(), has not been entered).

I was able to duplicate this problem by writing a program that simply
waited for the D-BUS "NameOwnerChanged" message showing that the 
"com.redhat.named" name had been acquired by a new owner (named has
just registered the name on the bus), and IMMEDIATELY did a "SetForwarders".
(I can append the program to this bug report if desired).

The "dbus_mgr" named task which handles dbus messages was doing a 
'dbus_connection_dispatch(...)' in its initialisation routine to flush the
NameAcquired / NameOwnerChanged messages - it could also end up receiving 
and handling a {Set,Get}Forwarders message - I should have foreseen this.

Now, with bind-9.3.1-18+ , just submitted to Rawhide, named is incapable of
handling ANY D-BUS message until it gets an event from the socket manager's
select() loop - it no longer calls dbus_connection_dispatch(...) in the 
initialisation code.

So this problem is fixed only in bind-9.3.1-18+ - you may have happened to
create a situation where NetworkManager is NOT waiting on NameOwnerChanged
to send a SetForwarders message, so it appeared to work, but this situation
could occur - use bind-9.3.1-18, which will be in tomorrow's rawhide, and 
which I have also copied to:
  http://people.redhat.com/~jvdias/BIND/rawhide/9.3.1-18/
Comment 3 Matthew Saltzman 2005-10-07 16:08:00 EDT
Glad to be of assistance 8^).

bind-9.3.1-18 has been working for me so far.  Thanks.

Note You need to log in before you can comment on or make changes to this bug.