132866 – Bind not always restarts sucessfull using "restart" or dies silent

Bug 132866 - Bind not always restarts sucessfull using "restart" or dies silent

Summary: Bind not always restarts sucessfull using "restart" or dies silent

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	bind
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jason Vas Dias
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	FC3Target
TreeView+	depends on / blocked

Reported:	2004-09-18 12:04 UTC by Robert Scheck
Modified:	2016-04-20 03:53 UTC (History)
CC List:	0 users
Fixed In Version:	9.2.4-1
Clone Of:
Environment:
Last Closed:	2004-10-06 11:50:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:567	0	normal	SHIPPED_LIVE	Updated bind packages	2004-12-21 05:00:00 UTC

Description Robert Scheck 2004-09-18 12:04:05 UTC

Description of problem:
Bind not always restarts sucessfull using "service named restart" or
it (at least seems so) that bind totally silent dies some time later. 
The onliest I've got in my log files is:

--- snipp ---
Sep 18 04:00:06 devel named:  succeeded
Sep 18 04:00:06 devel named[6676]: shutting down: flushing changes
Sep 18 04:00:06 devel named[6676]: stopping command channel on 127.0.0.1#953
Sep 18 04:00:06 devel named[6676]: stopping command channel on ::1#953
Sep 18 04:00:06 devel named[6676]: no longer listening on ::#53
Sep 18 04:00:06 devel named[6676]: no longer listening on 127.0.0.1#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP1]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP2]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP3]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP4]#53
Sep 18 04:00:06 devel named[6676]: no longer listening on [IP5]#53
Sep 18 04:00:09 devel named[6676]: exiting
--- snapp ---

What I did was a "service named restart" in /etc/cron.daily/foo but,
named never came up this morning.

Version-Release number of selected component (if applicable):
bind-9.2.4rc7-12

How reproducible:
Sometimes, see below.

Steps to Reproduce:
1. echo "service named restart" > /etc/cron.daily/foo
2. Wait for cron.daily run
3. Check whether bind runs or not
  
Actual results / Expected results:
Sucessfull restart of bind at any time and no maybe silent deaths...

Comment 1 Jason Vas Dias 2004-09-20 17:57:39 UTC

 This is very strange - I've never seen this before, and we have 
 BIND installations here that run 24/7.

 Presumably you 'allow-update' to some zones and want to save
 the pending .jnl updates to the master zone files ? 
 If not, what is the reason you need to periodically restart named?
 
 Please download and install the latest bind-9.2.4rc8-14 .
 
 Please append further information to this bug:
    
 1. Turn on named tracing and gather debug data during restart:

 a. If you have selinux ENABLED and ENFORCING, run the command: 
       'setenforce 0',
    or 'setsebool named_write_master_zones 1'

 b. Enable core file generation:
    Edit /etc/profile :
    Comment out this line (@ line 28):
    '
    # ulimit -S -c 0 > /dev/null 2>&1 
    '
    and add this line:
    '
    ulimit -c unlimited 
    '

 c. Change the "service named restart" in your cron job to these
    commands:
    ' . /etc/sysconfig/named
      /bin/touch ${ROOTDIR}/var/named/named.run
      /bin/chown named:named ${ROOTDIR}/var/named/named.run
      /usr/sbin/rndc trace 99
      export OPTIONS='-d 99'
      bash -xf /etc/init.d/named restart  2>&1 \
           >> /tmp/named.init.dbg.`/bin/date '+%s'`.log 
      echo "/usr/sbin/rndc trace 0;                          \
            /usr/bin/gzip < ${ROOTDIR}/var/named/named.run > \
                  /tmp/named.dbg.\`/bin/date '+%s'\`.log.gz; \
            rm -f ${ROOTDIR}/var/named/named.run"   |        \
                  /usr/bin/at now+20min;                     \
    '

If the debug log shows named is still running 20 mins after restart,
but it still dies, remove the last 'echo ... | ... at ...;'  command;
tracing will then still be enabled when named exits ( a very large
debug file may be generated ). 

Then, the next day, if named is not running, please tar up the
resulting /tmp/named.dbg*.log.gz and any core files :
'
    . /etc/sysconfig/named
    gzip ${ROOTDIR}/var/named/core.[0-9]*
    tar -cpvf /tmp/named.dbg.tar /tmp/named/*.log.gz \    
         ${ROOTDIR}/var/named/core.*.gz
    rm -f ${ROOTDIR}/var/named/core.*.gz
'
and append the resulting  /tmp/named.dbg.tar to this bug.

Comment 2 Robert Scheck 2004-10-06 11:50:01 UTC

I'm running bind 9.2.4 since September 24, 2004 and I'm not able to 
reproduce this problem, I had with 9.2.4rc7-12. I'll close this bug 
report and mark 9.2.4 as fix for it. But I'll feel free to reopen it, 
if I'm able to reproduce it (hopefully I'm not able to), again.

Comment 3 John Flanagan 2004-12-21 19:49:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-567.html

Note You need to log in before you can comment on or make changes to this bug.