Description of problem: BIND dies about 4-5 times a day. Not sure what is provoking it. Version-Release number of selected component (if applicable): bind-9.9.4-8.fc20.x86_64 also tried: bind-9.9.4-9.fc20.x86_64 How reproducible: Not sure what is provoking it. I don't know how to reproduce it other than letting it run. Steps to Reproduce: 1. 2. 3. Actual results: Dec 2 16:14:00 beta named[2189]: name.c:1724: INSIST(count <= 63) failed, back trace Dec 2 16:14:00 beta named[2189]: #0 0x7f1ab10188f0 in ?? Dec 2 16:14:00 beta named[2189]: #1 0x7f1aaf20a17a in ?? Dec 2 16:14:00 beta named[2189]: #2 0x7f1ab08585d9 in ?? Dec 2 16:14:00 beta named[2189]: #3 0x7f1ab085cc5d in ?? Dec 2 16:14:00 beta named[2189]: #4 0x7f1ab1019c03 in ?? Dec 2 16:14:00 beta named[2189]: #5 0x7f1ab101e206 in ?? Dec 2 16:14:00 beta named[2189]: #6 0x7f1ab10288ee in ?? Dec 2 16:14:00 beta named[2189]: #7 0x7f1ab102e12d in ?? Dec 2 16:14:00 beta named[2189]: #8 0x7f1ab100db73 in ?? Dec 2 16:14:00 beta named[2189]: #9 0x7f1aaf22c856 in ?? Dec 2 16:14:00 beta named[2189]: #10 0x7f1aaede0f33 in ?? Dec 2 16:14:00 beta named[2189]: #11 0x7f1aae084ead in ?? Dec 2 16:14:00 beta named[2189]: exiting (due to assertion failure) Dec 2 16:14:00 beta systemd: named.service: main process exited, code=killed, status=6/ABRT Dec 2 16:14:00 beta sh: Usage: Dec 2 16:14:00 beta sh: kill [options] <pid|name> [...] Dec 2 16:14:00 beta sh: Options: Dec 2 16:14:00 beta sh: -a, --all do not restrict the name-to-pid conversion to processes Dec 2 16:14:00 beta sh: with the same uid as the present process Dec 2 16:14:00 beta sh: -s, --signal <sig> send specified signal Dec 2 16:14:00 beta sh: -q, --queue <sig> use sigqueue(2) rather than kill(2) Dec 2 16:14:00 beta sh: -p, --pid print pids without signaling them Dec 2 16:14:00 beta sh: -l, --list [=<signal>] list signal names, or convert one to a name Dec 2 16:14:00 beta sh: -L, --table list signal names and numbers Dec 2 16:14:00 beta sh: -h, --help display this help and exit Dec 2 16:14:00 beta sh: -V, --version output version information and exit Dec 2 16:14:00 beta sh: For more details see kill(1). Dec 2 16:14:00 beta systemd: named.service: control process exited, code=exited status=1 Dec 2 16:14:00 beta systemd: Unit named.service entered failed state. Expected results: BIND should not crash. Additional info: This particular box was running Fedora 17. BIND ran without any issues. I jumped on the Fedora 20 Beta. Recent yum update. I also pulled the latest build of bind out of Koji. I would like to provide more information. If anyone has an idea on how I can pull more info and put it into this bug report, please speak up. I am currently using systemd's ability to restart a failed process. Not the best solution, but it makes an okay bandaid.
Hi. Please: 1. install bind-debuginfo package 2. install abrt package & enable abrtd (it should catch the crash when it happens) 3. set the debug level of BIND using 'rndc trace 99' Once it crashes, please provide the coredump created by ABRT, and output from /var/log/messages OR journal. Please note that journal is able to filter log messages related to a particular unit (named) using -u option. Thank you in advance.
Can you please attach logs from /var/named/data (the default dir if you didn't change the logging settings) ?
(In reply to Tomas Hozza from comment #1) > Hi. > > Please: > 1. install bind-debuginfo package > 2. install abrt package & enable abrtd (it should catch the crash when it > happens) > 3. set the debug level of BIND using 'rndc trace 99' > > Once it crashes, please provide the coredump created by ABRT, and output > from /var/log/messages OR journal. Please note that journal is able to > filter log messages related to a particular unit (named) using -u option. > > Thank you in advance. I installed abrt and the debuginfo packages as soon as I noticed that this was an ongoing problem. The problem is that when I do a 'abrt list' I don't get anything back. I know that bind has stopped at least 10 times since installing the abrt and debuginfo packages. abrt doesn't seem to be picking up on the issue. abrtd is running. Am I doing something wrong? Based on your recommendation I turned on 'rndc trace 99'. I hope to have more information once it fails again.
(In reply to Jeff Gustafson from comment #3) > I installed abrt and the debuginfo packages as soon as I noticed that this > was an ongoing problem. The problem is that when I do a 'abrt list' I don't > get anything back. I know that bind has stopped at least 10 times since > installing the abrt and debuginfo packages. abrt doesn't seem to be picking > up on the issue. abrtd is running. Am I doing something wrong? You have to have abrt-ccpp package installed, enabled and running, too.
It crashed only once yesterday! I did have the abrt-ccpp package installed, but not enabled. I made sure it is enabled for next time. Here is what I have from the 'rndc trace 99': decrement_reference: delete from rbt: 0x7fb053b14748 ads.pointroll.COM fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelquery dispatch 0x7fb0540eaab0 response 0x7fb05a378200 192.93.0.4#53: detaching from task 0x7fb0608326d0 dispatch 0x7fb0540eaab0: detach: refcount 1 fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelqueries dns_adb_destroyfind on find 0x7fb05a42eb50 dns_adb_destroyfind on find 0x7fb05a4305b0 dns_adb_destroyfind on find 0x7fb059035790 dns_adb_destroyfind on find 0x7fb058f13a60 dns_adb_destroyfind on find 0x7fb059b92970 dns_adb_destroyfind on find 0x7fb058f22790 dns_adb_destroyfind on find 0x7fb059033e20 dns_adb_destroyfind on find 0x7fb0590213d0 fctx 0x7fb059d80440(ns2.worldstream.nl/A): try fctx 0x7fb059d80440(ns2.worldstream.nl/A): cancelqueries fctx 0x7fb059d80440(ns2.worldstream.nl/A): getaddresses fctx 0x7fb059d80440(ns2.worldstream.nl/A): query resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): send socket 0x7fb058f8a700 0.0.0.0#18815: bound dispatch 0x7fb0540eb6f0 response 0x7fb05a378200 217.23.0.121#53: attached to task 0x7fb0608326d0 socket 0x7fb058f8a700: socket_recv: event 0x7fb05a197010 -> task 0x7fb060841d90 sockmgr 0x7fb060880010: watcher got message -3 for socket 520 resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): sent resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): udpconnected resquery 0x7fb059d83418 (fctx 0x7fb059d80440(ns2.worldstream.nl/A)): senddone sockmgr 0x7fb060880010: watcher got message -2 for socket -1 decrement_reference: delete from rbt: 0x7fb058e26eb0 216.228.228.67.zen.spamhaus.org decrement_reference: delete from rbt: 0x7fb058d683d8 67.zen.spamhaus.org dispatch 0x7fb0540eaab0: got packet: requests 0, buffers 5, recvs 0 sockmgr 0x7fb060880010: watcher got message -5 for socket 521 sockmgr 0x7fb060880010: watcher got message -2 for socket -1 name.c:1724: INSIST(count <= 63) failed, back trace #0 0x7fb06090e8f0 in ?? #1 0x7fb05eb0017a in ?? #2 0x7fb06014e5d9 in ?? #3 0x7fb060152c5d in ?? #4 0x7fb06090fc03 in ?? #5 0x7fb060914206 in ?? #6 0x7fb06091e8ee in ?? #7 0x7fb06092412d in ?? #8 0x7fb060903b73 in ?? #9 0x7fb05eb22856 in ?? #10 0x7fb05e6d6f33 in ?? #11 0x7fb05d97aead in ?? exiting (due to assertion failure)
I captured an abrt report. It uploaded it to: #1038319
Great. I'm going to close this Bug as DUPLICATE of the ABRT bug, since there is more information. *** This bug has been marked as a duplicate of bug 1038319 ***