Bug 509438

Summary: after patching NAMED it fails to automatically start after shutdown or restart
Product: Red Hat Enterprise Linux 5 Reporter: Verne Britton <verne>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED NOTABUG QA Contact: BaseOS QE <qe-baseos-auto>
Severity: urgent Docs Contact:
Priority: low    
Version: 5.3CC: ovasik, verne
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-07 14:00:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Verne Britton 2009-07-02 19:29:15 UTC
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

2nd time this happened ... automatic update sees a BIND patch, so it gets downloaded and applied.  But the stop/start (or RESTART) sequence does not fully complete and NAMED is left not functional  :-)


Reproducible: Didn't try

Actual Results:  
I remember having NAMED messed up sometime in the last year from an automatic patch ... so today (and it happened today on two RHEL v5 servers here) is the 2nd time I have seen it.  (by IT I mean after the application of a patch the sub-system did not restart properly)



in /var/log/messages   I see

Jul  2 10:49:57 names2 named[1916]: shutting down: flushing changes
Jul  2 10:49:57 names2 named[1916]: stopping command channel on 0.0.0.0#953
Jul  2 10:49:57 names2 named[1916]: no longer listening on 127.0.0.1#53
Jul  2 10:49:57 names2 named[1916]: no longer listening on 129.71.254.5#53
Jul  2 10:49:57 names2 named[1916]: no longer listening on 192.168.122.1#53


which I assume is from the attempt to shut down NAMED to get it to restart ... either by calling the SERVICE cmd or doing something else ...

but NAMED as a process never dies ... and thus never gets restarted either  :-)

To fix todays problem I did a KILL -9 on it, then a SERVICE NAMED START  to get things back to normal.

When this hits my two primary public DNS servers, it hurts  :-)

Maybe I just have too many zones ... our two servers have duplicate configs ... they handle about 900 zones (each server has the same 900 zones) and both do a lot of recursive lookups.

The more I think about this, since it causes NAMED to hang, I am going to mark this as Urgent ... but I can be talked into downgrading the Severity I suppose  :-)

Comment 1 Adam Tkac 2009-07-03 08:36:05 UTC
Would it be possible to tell me if you have bind-chroot package installed and if you are running named in chroot environment, please?

Comment 2 Verne Britton 2009-07-06 15:43:38 UTC
Sorry, YES we have bind installed in the chroot environment.

Comment 3 Adam Tkac 2009-07-13 14:32:22 UTC
I'm not able to reproduce this issue. Would it be possible to attach more information, please?

Try to update bind packages and then attach your /var/log/messages file, please. Make sure you check "private" button when you add an attachment if your log contains sensitive information.

If named hangs during update please do:
- yum --enablerepo rhel-debuginfo install bind-debuginfo.<x86_64|i386>
- gdb attach <named_pid>

then in the gdb terminal run "t a a bt full" and attach output here.

Thanks.

Comment 4 Adam Tkac 2009-12-07 14:00:37 UTC
It seems this problem is already fixed. If you are still able to reproduce it please reopen this bug.