Bug 169969 - named fails to start
Summary: named fails to start
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-10-05 23:19 UTC by Matthew Saltzman
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version: bind-9.3.1-18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-06 18:29:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Matthew Saltzman 2005-10-05 23:19:59 UTC
Description of problem:

named does not start

Version-Release number of selected component (if applicable):
bind-9.3.1-14

How reproducible:
always

Steps to Reproduce:
1. chkconfig named on
2. service named start
3.
  
Actual results:
[FAILED]


Expected results:
[OK]

Additional info:
RPM rebuilt from source in FC4.  Also have bind-libs, bind-utils, and
caching-nameserver-7.3-3 installed.

Also fails with selinux in permissive mode.

The following appears in /var/log/messages:

Oct  5 19:15:22 vincent named[3659]: starting BIND 9.3.1 -u named -D
Oct  5 19:15:22 vincent named[3659]: found 1 CPU, using 1 worker thread
Oct  5 19:15:22 vincent named[3659]: loading configuration from '/etc/named.conf'
Oct  5 19:15:22 vincent named[3659]: listening on IPv4 interface lo, 127.0.0.1#53
Oct  5 19:15:22 vincent named[3659]: listening on IPv4 interface eth0,
130.127.131.165#53
Oct  5 19:15:22 vincent named[3659]: command channel listening on 127.0.0.1#953
Oct  5 19:15:22 vincent named[3659]: zone 0.in-addr.arpa/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone 0.0.127.in-addr.arpa/IN: loaded serial
1997022700
Oct  5 19:15:22 vincent named[3659]: zone 255.in-addr.arpa/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN:
loaded serial 1997022700
Oct  5 19:15:22 vincent named[3659]: zone localdomain/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: zone localhost/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: task.c:1272: REQUIRE(task->state ==
task_state_running) failed
Oct  5 19:15:22 vincent named[3659]: exiting (due to assertion failure)
Oct  5 19:15:22 vincent NetworkManager: <WARNING>       (): Could not set
forwarders for zone '.'.  Error: 'Message did not receive a reply (timeout by
message bus)'.

Comment 1 Matthew Saltzman 2005-10-06 13:56:01 UTC
Removed *all* RPMs related to named, caching-nameserver, NetworkManager. 
Removed /var/named and all contents.  Reinstalled all of the above.  Now it's
working fine.

I'll leave this for Jason to close, as there may be a packaging bug of some sort
causing this problem on upgrades.

Comment 2 Jason Vas Dias 2005-10-06 18:29:07 UTC
Thanks for the detective work in reporting this issue, as it has exposed a bug:

Oct  5 19:15:22 vincent named[3659]: zone localhost/IN: loaded serial 42
Oct  5 19:15:22 vincent named[3659]: task.c:1272: REQUIRE(task->state ==
task_state_running) failed
Oct  5 19:15:22 vincent named[3659]: exiting (due to assertion failure)

These messages show that a task enters 'isc_task_beginexclusive(...)' 
BEFORE isc_app_run(...) is entered and named's tasks are properly
in the "running" state. 

Normally, you should see the messages:
    'named[xxxx]: D-BUS dhcdbd subscription enabled.'
    'named[xxxx]: D-BUS service enabled.'
    'named[xxxx]: running'
before any dbus messages are processed. Once named emits the 'running' message,
it has entered isc_app_run(...) and tasks are ready to handle events.

To handle the SetForwarders request, named must call 'isc_task_beginexclusive'
to ensure exclusive access to the configuration data for the server thread 
handling the request, which will correctly cause an Abort if the task is not
in the "task_state_running" state (the task manager event processing loop,
isc_app_run(), has not been entered).

I was able to duplicate this problem by writing a program that simply
waited for the D-BUS "NameOwnerChanged" message showing that the 
"com.redhat.named" name had been acquired by a new owner (named has
just registered the name on the bus), and IMMEDIATELY did a "SetForwarders".
(I can append the program to this bug report if desired).

The "dbus_mgr" named task which handles dbus messages was doing a 
'dbus_connection_dispatch(...)' in its initialisation routine to flush the
NameAcquired / NameOwnerChanged messages - it could also end up receiving 
and handling a {Set,Get}Forwarders message - I should have foreseen this.

Now, with bind-9.3.1-18+ , just submitted to Rawhide, named is incapable of
handling ANY D-BUS message until it gets an event from the socket manager's
select() loop - it no longer calls dbus_connection_dispatch(...) in the 
initialisation code.

So this problem is fixed only in bind-9.3.1-18+ - you may have happened to
create a situation where NetworkManager is NOT waiting on NameOwnerChanged
to send a SetForwarders message, so it appeared to work, but this situation
could occur - use bind-9.3.1-18, which will be in tomorrow's rawhide, and 
which I have also copied to:
  http://people.redhat.com/~jvdias/BIND/rawhide/9.3.1-18/

Comment 3 Matthew Saltzman 2005-10-07 20:08:00 UTC
Glad to be of assistance 8^).

bind-9.3.1-18 has been working for me so far.  Thanks.


Note You need to log in before you can comment on or make changes to this bug.