Bug 1036959 - BIND dies with "exiting (due to assertion failure)"
Summary: BIND dies with "exiting (due to assertion failure)"
Keywords:
Status: CLOSED DUPLICATE of bug 1038319
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 20
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Tomáš Hozza
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-03 00:26 UTC by Jeff Gustafson
Modified: 2013-12-05 13:38 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-12-05 13:38:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jeff Gustafson 2013-12-03 00:26:44 UTC
Description of problem:
BIND dies about 4-5 times a day. Not sure what is provoking it. 

Version-Release number of selected component (if applicable):

bind-9.9.4-8.fc20.x86_64

also tried:
bind-9.9.4-9.fc20.x86_64

How reproducible:
Not sure what is provoking it. I don't know how to reproduce it other than letting it run.

Steps to Reproduce:
1.
2.
3.

Actual results:
Dec  2 16:14:00 beta named[2189]: name.c:1724: INSIST(count <= 63) failed, back trace
Dec  2 16:14:00 beta named[2189]: #0 0x7f1ab10188f0 in ??
Dec  2 16:14:00 beta named[2189]: #1 0x7f1aaf20a17a in ??
Dec  2 16:14:00 beta named[2189]: #2 0x7f1ab08585d9 in ??
Dec  2 16:14:00 beta named[2189]: #3 0x7f1ab085cc5d in ??
Dec  2 16:14:00 beta named[2189]: #4 0x7f1ab1019c03 in ??
Dec  2 16:14:00 beta named[2189]: #5 0x7f1ab101e206 in ??
Dec  2 16:14:00 beta named[2189]: #6 0x7f1ab10288ee in ??
Dec  2 16:14:00 beta named[2189]: #7 0x7f1ab102e12d in ??
Dec  2 16:14:00 beta named[2189]: #8 0x7f1ab100db73 in ??
Dec  2 16:14:00 beta named[2189]: #9 0x7f1aaf22c856 in ??
Dec  2 16:14:00 beta named[2189]: #10 0x7f1aaede0f33 in ??
Dec  2 16:14:00 beta named[2189]: #11 0x7f1aae084ead in ??
Dec  2 16:14:00 beta named[2189]: exiting (due to assertion failure)
Dec  2 16:14:00 beta systemd: named.service: main process exited, code=killed, status=6/ABRT
Dec  2 16:14:00 beta sh: Usage:
Dec  2 16:14:00 beta sh: kill [options] <pid|name> [...]
Dec  2 16:14:00 beta sh: Options:
Dec  2 16:14:00 beta sh: -a, --all              do not restrict the name-to-pid conversion to processes
Dec  2 16:14:00 beta sh: with the same uid as the present process
Dec  2 16:14:00 beta sh: -s, --signal <sig>     send specified signal
Dec  2 16:14:00 beta sh: -q, --queue <sig>      use sigqueue(2) rather than kill(2)
Dec  2 16:14:00 beta sh: -p, --pid              print pids without signaling them
Dec  2 16:14:00 beta sh: -l, --list [=<signal>] list signal names, or convert one to a name
Dec  2 16:14:00 beta sh: -L, --table            list signal names and numbers
Dec  2 16:14:00 beta sh: -h, --help     display this help and exit
Dec  2 16:14:00 beta sh: -V, --version  output version information and exit
Dec  2 16:14:00 beta sh: For more details see kill(1).
Dec  2 16:14:00 beta systemd: named.service: control process exited, code=exited status=1
Dec  2 16:14:00 beta systemd: Unit named.service entered failed state.


Expected results:
BIND should not crash.


Additional info:
This particular box was running Fedora 17. BIND ran without any issues. I jumped on the Fedora 20 Beta. Recent yum update. I also pulled the latest build of bind out of Koji.

I would like to provide more information. If anyone has an idea on how I can pull more info and put it into this bug report, please speak up.

I am currently using systemd's ability to restart a failed process. Not the best solution, but it makes an okay bandaid.

Comment 1 Tomáš Hozza 2013-12-03 07:42:18 UTC
Hi.

Please:
1. install bind-debuginfo package
2. install abrt package & enable abrtd (it should catch the crash when it happens)
3. set the debug level of BIND using 'rndc trace 99'

Once it crashes, please provide the coredump created by ABRT, and output
from /var/log/messages OR journal. Please note that journal is able to
filter log messages related to a particular unit (named) using -u option.

Thank you in advance.

Comment 2 Tomáš Hozza 2013-12-03 07:53:53 UTC
Can you please attach logs from /var/named/data (the default dir if you didn't
change the logging settings) ?

Comment 3 Jeff Gustafson 2013-12-03 19:48:28 UTC
(In reply to Tomas Hozza from comment #1)
> Hi.
> 
> Please:
> 1. install bind-debuginfo package
> 2. install abrt package & enable abrtd (it should catch the crash when it
> happens)
> 3. set the debug level of BIND using 'rndc trace 99'
> 
> Once it crashes, please provide the coredump created by ABRT, and output
> from /var/log/messages OR journal. Please note that journal is able to
> filter log messages related to a particular unit (named) using -u option.
> 
> Thank you in advance.

I installed abrt and the debuginfo packages as soon as I noticed that this was an ongoing problem. The problem is that when I do a 'abrt list' I don't get anything back. I know that bind has stopped at least 10 times since installing the abrt and debuginfo packages. abrt doesn't seem to be picking up on the issue. abrtd is running. Am I doing something wrong?

Based on your recommendation I turned on 'rndc trace 99'. I hope to have more information once it fails again.

Comment 4 Tomáš Hozza 2013-12-04 08:50:05 UTC
(In reply to Jeff Gustafson from comment #3)
> I installed abrt and the debuginfo packages as soon as I noticed that this
> was an ongoing problem. The problem is that when I do a 'abrt list' I don't
> get anything back. I know that bind has stopped at least 10 times since
> installing the abrt and debuginfo packages. abrt doesn't seem to be picking
> up on the issue. abrtd is running. Am I doing something wrong?

You have to have abrt-ccpp package installed, enabled and running, too.

Comment 5 Jeff Gustafson 2013-12-04 16:27:12 UTC
It crashed only once yesterday! I did have the abrt-ccpp package installed, but not enabled. I made sure it is enabled for next time. 

Here is what I have from the 'rndc trace 99':

decrement_reference: delete from rbt: 0x7fb053b14748 ads.pointroll.COM
fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelquery
dispatch 0x7fb0540eaab0 response 0x7fb05a378200 192.93.0.4#53: detaching from task 0x7fb0608326d0
dispatch 0x7fb0540eaab0: detach: refcount 1
fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelqueries
dns_adb_destroyfind on find 0x7fb05a42eb50
dns_adb_destroyfind on find 0x7fb05a4305b0
dns_adb_destroyfind on find 0x7fb059035790
dns_adb_destroyfind on find 0x7fb058f13a60
dns_adb_destroyfind on find 0x7fb059b92970
dns_adb_destroyfind on find 0x7fb058f22790
dns_adb_destroyfind on find 0x7fb059033e20
dns_adb_destroyfind on find 0x7fb0590213d0
fctx 0x7fb059d80440(ns2.worldstream.nl/A): try
fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelqueries
fctx 0x7fb059d80440(ns2.worldstream.nl/A): getaddresses
fctx 0x7fb059d80440(ns2.worldstream.nl/A): query
resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): send
socket 0x7fb058f8a700 0.0.0.0#18815: bound
dispatch 0x7fb0540eb6f0 response 0x7fb05a378200 217.23.0.121#53: attached to task 0x7fb0608326d0
socket 0x7fb058f8a700: socket_recv: event 0x7fb05a197010 -> task 0x7fb060841d90
sockmgr 0x7fb060880010: watcher got message -3 for socket 520
resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): sent
resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): udpconnected
resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): senddone
sockmgr 0x7fb060880010: watcher got message -2 for socket -1
decrement_reference: delete from rbt: 0x7fb058e26eb0 216.228.228.67.zen.spamhaus.org
decrement_reference: delete from rbt: 0x7fb058d683d8 67.zen.spamhaus.org
dispatch 0x7fb0540eaab0: got packet: requests 0, buffers 5, recvs 0
sockmgr 0x7fb060880010: watcher got message -5 for socket 521
sockmgr 0x7fb060880010: watcher got message -2 for socket -1
name.c:1724: INSIST(count <= 63) failed, back trace
#0 0x7fb06090e8f0 in ??
#1 0x7fb05eb0017a in ??
#2 0x7fb06014e5d9 in ??
#3 0x7fb060152c5d in ??
#4 0x7fb06090fc03 in ??
#5 0x7fb060914206 in ??
#6 0x7fb06091e8ee in ??
#7 0x7fb06092412d in ??
#8 0x7fb060903b73 in ??
#9 0x7fb05eb22856 in ??
#10 0x7fb05e6d6f33 in ??
#11 0x7fb05d97aead in ??
exiting (due to assertion failure)

Comment 6 Jeff Gustafson 2013-12-04 21:59:44 UTC
I captured an abrt report. It uploaded it to:

#1038319

Comment 7 Tomáš Hozza 2013-12-05 13:38:21 UTC
Great. I'm going to close this Bug as DUPLICATE of the ABRT bug, since there
is more information.

*** This bug has been marked as a duplicate of bug 1038319 ***


Note You need to log in before you can comment on or make changes to this bug.