Bug 132866

Summary:	Bind not always restarts sucessfull using "restart" or dies silent
Product:	[Fedora] Fedora	Reporter:	Robert Scheck <redhat-bugzilla>
Component:	bind	Assignee:	Jason Vas Dias <jvdias>
Status:	CLOSED ERRATA	QA Contact:	Ben Levenson <benl>
Severity:	high	Docs Contact:
Priority:	medium
Version:	rawhide
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	9.2.4-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-10-06 11:50:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	123268

Description Robert Scheck 2004-09-18 12:04:05 UTC

Description of problem:
Bind not always restarts sucessfull using "service named restart" or
it (at least seems so) that bind totally silent dies some time later. 
The onliest I've got in my log files is:

--- snipp ---
Sep 18 04:00:06 devel named:  succeeded
Sep 18 04:00:06 devel named[6676]: shutting down: flushing changes
Sep 18 04:00:06 devel named[6676]: stopping command channel on 127.0.0.1#953
Sep 18 04:00:06 devel named[6676]: stopping command channel on ::1#953
Sep 18 04:00:06 devel named[6676]: no longer listening on ::#53
Sep 18 04:00:06 devel named[6676]: no longer listening on 127.0.0.1#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP1]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP2]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP3]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP4]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP5]#53
Sep 18 04:00:09 devel named[6676]: exiting
--- snapp ---

What I did was a "service named restart" in /etc/cron.daily/foo but,
named never came up this morning.

Version-Release number of selected component (if applicable):
bind-9.2.4rc7-12

How reproducible:
Sometimes, see below.

Steps to Reproduce:
1. echo "service named restart" > /etc/cron.daily/foo
2. Wait for cron.daily run
3. Check whether bind runs or not
  
Actual results / Expected results:
Sucessfull restart of bind at any time and no maybe silent deaths...

Comment 1 Jason Vas Dias 2004-09-20 17:57:39 UTC

 This is very strange - I've never seen this before, and we have 
 BIND installations here that run 24/7.

 Presumably you 'allow-update' to some zones and want to save
 the pending .jnl updates to the master zone files ? 
 If not, what is the reason you need to periodically restart named?
 
 Please download and install the latest bind-9.2.4rc8-14 .
 
 Please append further information to this bug:
    
 1. Turn on named tracing and gather debug data during restart:

 a. If you have selinux ENABLED and ENFORCING, run the command: 
       'setenforce 0',
    or 'setsebool named_write_master_zones 1'

 b. Enable core file generation:
    Edit /etc/profile :
    Comment out this line (@ line 28):
    '
    # ulimit -S -c 0 > /dev/null 2>&1 
    '
    and add this line:
    '
    ulimit -c unlimited 
    '

 c. Change the "service named restart" in your cron job to these
    commands:
    ' . /etc/sysconfig/named
      /bin/touch ${ROOTDIR}/var/named/named.run
      /bin/chown named:named ${ROOTDIR}/var/named/named.run
      /usr/sbin/rndc trace 99
      export OPTIONS='-d 99'
      bash -xf /etc/init.d/named restart  2>&1 \
           >> /tmp/named.init.dbg.`/bin/date '+%s'`.log 
      echo "/usr/sbin/rndc trace 0;                          \
            /usr/bin/gzip < ${ROOTDIR}/var/named/named.run > \
                  /tmp/named.dbg.\`/bin/date '+%s'\`.log.gz; \
            rm -f ${ROOTDIR}/var/named/named.run"   |        \
                  /usr/bin/at now+20min;                     \
    '

If the debug log shows named is still running 20 mins after restart,
but it still dies, remove the last 'echo ... | ... at ...;'  command;
tracing will then still be enabled when named exits ( a very large
debug file may be generated ). 

Then, the next day, if named is not running, please tar up the
resulting /tmp/named.dbg*.log.gz and any core files :
'
    . /etc/sysconfig/named
    gzip ${ROOTDIR}/var/named/core.[0-9]*
    tar -cpvf /tmp/named.dbg.tar /tmp/named/*.log.gz \    
         ${ROOTDIR}/var/named/core.*.gz
    rm -f ${ROOTDIR}/var/named/core.*.gz
'
and append the resulting  /tmp/named.dbg.tar to this bug.

Comment 2 Robert Scheck 2004-10-06 11:50:01 UTC

I'm running bind 9.2.4 since September 24, 2004 and I'm not able to 
reproduce this problem, I had with 9.2.4rc7-12. I'll close this bug 
report and mark 9.2.4 as fix for it. But I'll feel free to reopen it, 
if I'm able to reproduce it (hopefully I'm not able to), again.

Comment 3 John Flanagan 2004-12-21 19:49:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-567.html