Bug 768702

Summary: BIND will not shutdown or log after log rotation
Product: Red Hat Enterprise Linux 6 Reporter: Daniel McNamara <daniel>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED ERRATA QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: ovasik
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 10:15:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel McNamara 2011-12-18 04:45:46 UTC
Description of problem:

Seems the race condition from #623122 still exists and becomes very apparent after logrotation (although I have no evidence of a causation link).

After log rotation:

RNDC shows the service is still running:

rndc status
version: 9.7.3-P3-RedHat-9.7.3-8.P3.el6
CPUs found: 16
worker threads: 16
number of zones: 27
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running

Queries will still be responded to (both recursive and locally hosted).

RNDC works - for example you can turn on querylogging

[root@machine ~]# rndc querylog
[root@machine ~]# rndc status
version: 9.7.3-P3-RedHat-9.7.3-8.P3.el6
CPUs found: 16
worker threads: 16
number of zones: 27
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is ON
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running


However the named service no longer writes to the log file. A query can run but it won't be logged.

A restart does not work:

[root@machine ~]# /etc/init.d/named restart
Stopping named: .............umount: /var/named/chroot/var/named: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
                                                           [FAILED]
Starting named: named: already running                     [  OK  ]

Although the named process is running 

[root@machine ~]# ps aux |grep named
named     6233  0.0  0.5 772756 30884 ?        Ssl  Dec12   0:05 /usr/sbin/named -u named -t /var/named/chroot

The service no longer responds to queries or rndc commands as the inet ports are no longer active.

Only way to restart the service is to force kill the named process

[root@machine ~]# kill -9 6233
[root@machine ~]# /etc/init.d/named restart
Stopping named:                                            [  OK  ]
Starting named:                                            [  OK  ]

Once done everything remounts properly and the service works properly again.

Redoing [root@machine ~]# rndc querylog

A quick dig check shows the query now logging correctly to the log file

dig @127.0.0.1 -t a www.redhat.com

From /var/log/messages:

Dec 18 04:42:33 machine named[25513]: client 127.0.0.1#36785: query: www.redhat.com IN A + (127.0.0.1)


I can only replicate this issue after the logrotations occur on Sunday morning. I'm assuming there's an issue with the chroot mounting of /var causing this lock bug.


Version-Release number of selected component (if applicable):

bind-utils-9.7.3-8.P3.el6.x86_64
bind-libs-9.7.3-8.P3.el6.x86_64
bind-9.7.3-8.P3.el6.x86_64
bind-chroot-9.7.3-8.P3.el6.x86_64
bind-devel-9.7.3-8.P3.el6.x86_64

Comment 2 Adam Tkac 2011-12-19 09:39:33 UTC
It seems this bug is mix of two issues.

-----
1. named stopped to write log messages:

I see you run named in chroot. May I ask you if you have rsyslogd configured correctly, please?

$ cat /etc/sysconfig/named
...
#       Don't forget to add "$AddUnixListenSocket /var/named/chroot/dev/log"
#       line to your /etc/rsyslog.conf file. Otherwise your logging becomes
#       broken when rsyslogd daemon is restarted (due update, for example).
...

-----
2. named hangs on shutdown

We will release update which fixes hang during shutdown soon. May I ask you to obtain backtrace when named hangs to confirm that the update fixes your issue as well, please?

1. install bind-debuginfo
2. `gdb attach <named_pid>`
3. inside gdb session run `t a a bt`
4. attach full output
-----

Thank you in advance!

Comment 3 Daniel McNamara 2011-12-19 09:51:28 UTC
Oops,

Seems I entirely missed that rsyslog configuration option. Adding that to the servers now and will check again after next rotation to make sure. For now it would indeed seem to be my misconfiguration mistake.

My apologies!

Comment 5 Adam Tkac 2011-12-20 10:02:24 UTC
We released bind update which should fix this issue yesterday:

http://rhn.redhat.com/errata/RHBA-2011-1836.html

Can you please update to bind-9.7.3-8.P3.el6_2.1 and check if named still hangs on shutdown? If yes, please attach backtrace as written in comment #2.

Thank you in advance!

Comment 6 Daniel McNamara 2011-12-20 10:14:43 UTC
The updated RPMs have been pushed to the servers. Should the hang issue reoccur within the next few days I will attach a backtrace as requested

Comment 7 Ondrej Vasik 2012-01-23 09:45:14 UTC
Any news here? Was that "bind hangs" issue fixed by the RHBA-2011-1836 update? TIA for answer.

Comment 8 Daniel McNamara 2012-01-23 09:53:02 UTC
Confirm this issue has not reappeared since RHBA-2011-1836 push

Comment 9 Adam Tkac 2012-01-23 10:15:31 UTC
(In reply to comment #8)
> Confirm this issue has not reappeared since RHBA-2011-1836 push

Thanks for feedback, closing.

This issue was fixed by the http://rhn.redhat.com/errata/RHBA-2011-1836.html update.