Bug 173961

Summary: Bind crashes with pthread_mutex_destroy failed
Product: Red Hat Enterprise Linux 4 Reporter: Bojan Smojver <bojan>
Component: bindAssignee: Jason Vas Dias <jvdias>
Status: CLOSED ERRATA QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: james.gardiner
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0288 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:14:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    

Description Bojan Smojver 2005-11-23 03:57:03 UTC
Description of problem:
Under heavy load, bind crashes (happened only once so far) with:

Nov 23 14:37:45.514 rbtdb.c:407: fatal error:
Nov 23 14:37:45.515 RUNTIME_CHECK(((pthread_mutex_destroy(((&rbtdb->node_locks[i
].lock))) == 0) ? 0 : 34) == 0) failed
Nov 23 14:37:45.527 exiting (due to fatal error in library)

Version-Release number of selected component (if applicable):
bind-9.2.4-10_EL4

How reproducible:
Happened once so far, after running for about a month under heavy load.

Steps to Reproduce:
1. Run bind on an SMP x86_64 box.
2. ?

Comment 1 Bojan Smojver 2005-11-23 03:59:32 UTC
C library is: glibc-2.3.4-2.13. BTW, this is a fully up-to-date RHEL4.

Comment 2 Bojan Smojver 2005-11-23 04:05:43 UTC
Oh, one more thing. I got the RPMS that I'm running by building from SRPMS in
relation to this bug:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=169416

Any point in trying 12_EL4 first? I does look like a threading library bug at
first glance...

Comment 3 Jason Vas Dias 2005-11-24 20:49:17 UTC
An upstream bug was found, fixed in the 9.2.6 and 9.3.2 versions, that could 
explain this problem :

1825.   [bug]           Missing UNLOCK() on out of memory error from in
                        rbtdb.c:subtractrdataset(). [RT #13519]

The code where the assertion fails: 
    rbtdb.c:407: fatal error:
    RUNTIME_CHECK(((pthread_mutex_destroy(((&rbtdb->node_locks[i].lock))) == 0) 
                  ? 0 : 34) == 0) failed
shows that a lock that is being destroyed is in fact locked - which should 
never happen .

It could potentially happen if named was transiently out-of-memory and bug 
1825 resulted in the lock not being unlocked, and then an attempt was made
to destroy the lock, which triggered the assertion failure.

And possibly this bug might have a bearing on this problem: 

1847.   [bug]           isc_ondestroy_init() is called too late in
                        dns_rbtdb_create()/dns_rbtdb64_create().
                        [RT #13661]

Anyway, I've now gone through all the upstream patches that have been 
applied in the 9.2.5 and 9.2.6 ISC releases, and applied those that I can
see fix genuine problems in the Linux source code, including the above
two problems, to produce a bind-9.2.4-14_EL4 version, which is the candidate 
for the next RHEL-4 update release . All upstream bugs fixed are listed in the
/usr/share/bind-9.2.4/CHANGES file.

This release is now available from:
  http://people.redhat.com/~jvdias/bind/RHEL-4/9.2.4-14_EL4/

Please try it out and let me know of any issues - thanks .

It is possible that your named process is getting very large and eventually
is running out of memory ( named relies on an in-memory database for its 
zone and cache database ).  If, after you've installed the new 
bind-9.2.4-14_EL4, the named process runs for a long time and then 
exits with an 'out of memory' error, then this is the root cause of
the problem. Perhaps consider using these named configuration options , 
as documented in the ARM (/usr/share/doc/bind-9.2.4/arm/Bv9ARM.html):
 cleaning-interval
    The server will remove expired resource records from the cache every  
    cleaning-interval minutes. The default is 60 minutes. 
 datasize
    The maximum amount of data memory the server may use 
If all else fails, and you can't add more physical memory (or swap) to the
machine, consider a periodic cron job to do 'rndc flush' - this will clear out
the cache, freeing up alot of memory, but will incur a severe performance hit.

NOTE: in order to help diagnose rarely occurring named crashes due to assertion
failures, it would be most helpful if you could please change this line in
/etc/init.d/named.init, around line 97:
from :
	if [ $conf_ok -eq 1 ]; then	   
	   daemon /usr/sbin/named -u named ${OPTIONS};
to :
	if [ $conf_ok -eq 1 ]; then	   
	   ulimit -c unlimited;
           /usr/sbin/named -u named ${OPTIONS};
           RETVAL=$?
           if [ $RETVAL -eq 0 ]; then
              success;
           else
              failure;
           fi;

This will enable any named assertion failure or memory access violation that may
occur to generate a  core dump, in $ROOTDIR/var/named/core.*  .
This file is likely to be very large, so please ensure you have enough disk
space on the $ROOTDIR/var/named/ partition ($ROOTDIR may be set in
/etc/sysconfig/named).

If a $ROOTDIR/var/named/core.* file is generated, please compress it and send
it to me: jvdias - this will enable me to determine exactly what
is the cause .

Comment 4 Bojan Smojver 2005-11-24 22:38:27 UTC
> It is possible that your named process is getting very large and eventually
is running out of memory ( named relies on an in-memory database for its 
zone and cache database )

Maybe, I'm not sure. In normal operation, named uses around 200 MB of RAM on
this machine, in order to serve 700+ zones. The box has 2 GB of RAM and is not
doing anything but serving DNS (it is a Sun Fire V20z box, with 2 x 2.2 GHz
Opterons). We have another box doing exact same thing on exactly the same
hardware as well, using slightly less memory. Each box serves between 600 and
2,500 request per second, if that somehow matters.

Thank you for your ultra quick response, pointers to DNS configuration and new
RPMS. I'll try them on and report back if the problem occurs again. These two
machines have been running for a significant time now and this was the only
incident we had, so the problem appears to be intermittent and probably depends
on conditions on our network (at times broken clients do weird things). So, it
may never occur again. I'm not sure if I should hope that it does or that it
does not, though ;-)

Comment 5 Bojan Smojver 2005-11-28 05:05:05 UTC
Running it now. Will report back if/when it crashes.

Comment 6 Bojan Smojver 2005-12-12 00:54:54 UTC
Now the other machine's named process crashed, with a slightly different (but
similar) error:

---------------------------------------
Dec 10 23:00:20.280 loading configuration from '/var/named/named.conf'
Dec 10 23:00:48.226 adb.c:2181: fatal error:
Dec 10 23:00:48.226 RUNTIME_CHECK(((pthread_mutex_destroy(((&adb->mplock))) == 0
) ? 0 : 34) == 0) failed
Dec 10 23:00:48.242 exiting (due to fatal error in library)
---------------------------------------

Looks like it crashed on HUP, while reloading the data.

Comment 8 Bojan Smojver 2006-03-09 02:19:37 UTC
Looks like this version of bind didn't make into U3...

Comment 10 Bojan Smojver 2006-03-15 21:10:33 UTC
Today, I had another one during the HUP:

---------------------------------------
Mar 15 23:00:41.611 adb.c:2181: fatal error:
Mar 15 23:00:41.612 RUNTIME_CHECK(((pthread_mutex_destroy(((&adb->mplock))) == 0
) ? 0 : 34) == 0) failed
Mar 15 23:00:41.625 exiting (due to fatal error in library)
---------------------------------------

Comment 11 Jason Vas Dias 2006-03-15 21:40:06 UTC
Hi Bojan -

Thanks for the report. 

Please confirm:
 o Which bind version are you running on the machine on which this occurred?
   Are you running the bind-9.2.4-16.EL4 or bind-9.3.2-1.EL4  test rpms from
   http://people.redhat.com/~jvdias/bind/RHEL-4 , or the standard bind-9.2.4-10
   RPMs ?
 
 o Does the problem only occur when you do an 'rndc reload' or 
   'kill -HUP `pidof named`' ?  Do you do the 'rndc reload' or 
   kill named manually ?

 o Is SELinux enabled in Enforcing or Permissive mode?
   # getenforce

The next time you're going to do an rndc reload, please enable named debugging,
as follows:
1. If SELinux is Enabled, ensure you have this logging configuration in 
   named.conf:
   'logging { channel default_debug { file "data/named.run"; severity dynamic;} };'
   OR that you do:
   # setsebool named_write_master_zones 1; chmod g+w $ROOTDIR/var/named

2. Then, whenever you are about to do the 'rndc reload' / kill -HUP, do:
   # rndc trace 99
   # rndc reload
   If the named process crashes, then there will be a debug log in 
   $ROOTDIR/var/named/data/named.run or $ROOTDIR/var/named/named.run.
   Please send this file to me (jvdias) or append it to this bug 
   report.
   If there is no crash, then do:
   # rndc trace 0
   to turn off debugging.

With the named.run data gathered as above, it should be possible to trace
exactly why this problem is occurring - thanks.

Comment 12 Bojan Smojver 2006-03-15 22:12:20 UTC
> Which bind version are you running on the machine on which this occurred

bind-9.2.4-14_EL4

> Does the problem only occur when you do an 'rndc reload' or 'kill -HUP `pidof
named`'

We always send HUP, so I cannot say about rndc reload. And the problem is very
intermittent that it would take months to verify. But, I can do it if it'll make
a difference.

> Is SELinux enabled in Enforcing or Permissive mode?

Disabled

I'm interested in getting 9.3.x anyway, so I'll download the latest RPMS and try
with that. I'm guessing these things have been tested to some extent, right?
They won't just crash on me out of the blue? :-)

Comment 13 Jason Vas Dias 2006-03-15 22:56:46 UTC
RE: >  I'm guessing these things have been tested to some extent, right?
Yes, bind-9.3.2-1.EL4 has passed the complete bind test suite, and our internal
regression tests - I've been using it for the past two months on RHEL-4 with
no problems so far , as have other RHEL-4 customers who've requested it. It is
not "officially supported" by Red Hat, but I as the bind package maintainer do
undertake to support it.

RE: >  We always send HUP
I would definitely suggest using 'rndc reload' instead.
The difference is, that 'rndc reload' sends a message to named, so named 
processes it in its normal select() loop,  while an 'HUP' signal could 
interrupt named during a system call - perhaps this is what may be causing
the problem - does the problem ONLY occur when you send the HUP ?

Anyway, since you are not using SELinux, named can create the normal
$ROOTDIR/var/named/named.run file OK - please, when you want to reload the
named configuration, do:
'rndc trace 99;
 rndc reload;
 rndc trace 0;
'
You could even put these commands in a "reload_named" script.
If the problem occurs, then the 'rndc trace 0' will fail, since named is 
no longer running, and there will be a $ROOTDIR/var/named/named.run file
containing debug messages that might help us get to the root cause of this
problem - thanks.

Comment 14 Bojan Smojver 2006-03-15 23:03:50 UTC
> Yes, bind-9.3.2-1.EL4 has passed the complete bind test suite, and our internal
regression tests

Nice. I'll whack it on today.

> does the problem ONLY occur when you send the HUP

Yes. After you applied some backports to the RPM, the initial problems never
occured again. But, the HUP problem started happening (this may have been there
from the start, of course, but it may have not been triggered.

We'll definitely change the way we reload and will use rndc.

> rndc trace

Thanks for the tip. Will implement in our reload script.

Let's see what 9.3.2 does in the meantime...

Comment 19 Bojan Smojver 2006-04-04 00:54:55 UTC
Just had a crash with 9.3.2 (bind-9.3.2-1_EL4). The only message is this:

-------------------------------------
Apr  4 10:48:28 joule kernel: named[3827]: segfault at 000000000000002c rip
0000002a958332f9 rsp 00000000413ff7f0 error 4
-------------------------------------

No idea what that's supposed t mean (except that it's a segfault, of course).
Anyhow, this was not during a reload or anything - just normal operation.

Any suggestions welcome...

Comment 20 Bojan Smojver 2006-04-04 01:38:40 UTC
And again:

-------------------------------------
Apr  4 11:19:31 joule kernel: named[5063]: segfault at 000000000000002c rip
0000002a958332f9 rsp 00000000413ff7f0 error 4
-------------------------------------

Comment 21 Jason Vas Dias 2006-04-04 19:30:39 UTC
Hi Bojan -

Many thanks for reporting this. 

Happily, the 2.6 x86_64 kernel has told us the instruction address where 
the core occurred, and I can use GDB with bind-debuginfo installed:
(gdb) break *0x2a958332f9
Breakpoint 2 at 0x2a958332f9: file resolver.c, line 3797
which is in function ncache_adderesult, which adds a result to the cache:
     } else if ( result == ISC_R_SUCCESS ) {
3797:   if( NXDOMAIN(ardataset)) ...
and NXDOMAIN is:
  #define NXDOMAIN(r) ((r)->attributes & DNS_RDATASETATTR_NXDOMAIN))
so it appears "ardataset" is null ( an empty rdataset was returned for a 
query that did not return an error ) - this is also a legally cacheable
result, so the code should be checking for ( ardataset == NULL ) here.

This might possibly be due to one of these ISC BIND 9.3.2 bugs (for which
ISC have not yet released the fix):

1941.  	[bug]  	ncache_adderesult() should set eresult even if no rdataset is
passed to it. [RT #15642]
1939.  	[bug]  	The resolver could dereference a null pointer after validation
if all the queries have timed out. [RT #15528]

I've sent my proposed patch upstream to check if this is a known issue or not.

Anyway, there appears to be a simple fix for this issue - I've applied it with
bind-9.3.2-2.EL4, available from:
   http://people.redhat.com/~jvdias/bind/RHEL-4/9.3.2-2.EL4 .
This version also passes all tests - please try it out and let me know of any
issues.

Regards,
Jason Vas Dias
BIND package maintainer

Comment 22 Bojan Smojver 2006-04-04 19:40:58 UTC
WOW - that was quick. Thanks heaps! I'll be putting this baby in production
today and will report back if it breaks.

Comment 24 Bojan Smojver 2006-04-04 23:07:45 UTC
In production now. Fingers crossed...

Comment 26 Bob Johnson 2006-04-11 16:39:50 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 27 Bojan Smojver 2006-06-19 07:06:45 UTC
So far, so good with bind-9.3.2-2_EL4...

Comment 29 Red Hat Bugzilla 2006-08-10 21:14:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0288.html


Comment 30 Adam Tkac 2007-03-06 16:37:28 UTC
*** Bug 202393 has been marked as a duplicate of this bug. ***