Bug 623122 - can't stop bind
can't stop bind
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: bind (Show other bugs)
6.0
All Linux
high Severity high
: rc
: ---
Assigned To: Adam Tkac
qe-baseos-daemons
:
Depends On: 672514
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-11 07:57 EDT by Levente Farkas
Modified: 2012-07-11 03:19 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
under certain circumstances, "named" was entering a deadlock. Consequently, "named" could not be stopped using the "/etc/init.d/named stop". In this updated package, the deadlock no longer occurs, resolving this issue.
Story Points: ---
Clone Of:
: 643102 (view as bug list)
Environment:
Last Closed: 2011-05-19 08:58:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (3.23 KB, patch)
2010-10-14 11:57 EDT, Adam Tkac
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0541 normal SHIPPED_LIVE bind bug fix and enhancement update 2011-05-18 13:57:36 EDT

  None (edit)
Description Levente Farkas 2010-08-11 07:57:25 EDT
referring to #622785
that currently it's not possible to stop named. ie
i've got only:
----------------------------------
# /etc/init.d/named stop
Stopping named: .............                              [FAILED]
# ps axuf|grep named
root     13140  0.0  0.0 103148   808 pts/0    S+   11:03   0:00          \_
grep named
named    12493  0.0  0.5 240372 19900 ?        Ssl  10:52   0:00
/usr/sbin/named -u named
----------------------------------
the only way is kill -9.
and the gdb output is:
----------------------------------
Program received signal SIGTERM, Terminated.
0x00007f1490abbd64 in sigsuspend () from /lib64/libc.so.6
(gdb) t a a bt

Thread 5 (Thread 0x7f148f53b710 (LWP 1425)):
#0  0x00007f14915fd43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a3fcf1 in dispatch (uap=0x7f1492fc4010) at task.c:961
#2  run (uap=0x7f1492fc4010) at task.c:1158
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f148eb3a710 (LWP 1426)):
#0  0x00007f14915fd43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a3fcf1 in dispatch (uap=0x7f1492fc4010) at task.c:961
#2  run (uap=0x7f1492fc4010) at task.c:1158
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f148e139710 (LWP 1427)):
#0  0x00007f14915fd7a9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a54d4e in isc_condition_waituntil (c=0x7f1492fc5078, m=0x7f1492fc5028, t=0x7f1492fc506c) at condition.c:59
#2  0x00007f1491a4239d in run (uap=0x7f1492fc5010) at timer.c:822
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f148d738710 (LWP 1428)):
#0  0x00007f1490b6ab13 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1491a51e44 in watcher (uap=0x7f1492fc7010) at socket.c:3695
#2  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f14930007c0 (LWP 1424)):
#0  0x00007f1490abbd64 in sigsuspend () from /lib64/libc.so.6
#1  0x00007f1491a43e34 in isc__app_ctxrun (ctx0=0x7f1491c68b60) at app.c:680
#2  0x00007f149303dd3f in main (argc=<value optimized out>, argv=0x7fff3a62af08) at ./main.c:1031
----------------------------------
Comment 2 RHEL Product and Program Management 2010-08-11 08:18:30 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 3 Levente Farkas 2010-08-17 05:42:55 EDT
ok now bind simple unusable! we can't stop it and stop working randomly. imho it's a serious bug since it a basic service without it nothing is working!
Comment 4 Levente Farkas 2010-08-17 05:56:42 EDT
it's turn out to happened when i set forwarders in the config file.
Comment 5 Adam Tkac 2010-08-17 07:43:32 EDT
If I understand correctly when you don't set forwarders everything is OK? Would it be possible to check if it's something strange in the system log, please? Also try to disable DNSSEC validation (via `rndc validation off`, for example) and report if it helps, please.
Comment 6 Levente Farkas 2010-08-17 08:47:06 EDT
here is our named.conf. if you uncomment the the forward lines then it's happened on our firewall (with two interface external dhcp, internal static bridged network 10.30.0.1, plus 2 tun device for openvpn connection 10.20.0.1, 10.10.0.2)
-------------------------
acl internal {
        10.30.0.0/24;
        192.168.0.0;
        10.20.0.1/24;
        10.10.0.2/24;
};

acl dns {
        127.0.0.1;
        10.30.0.1;
        192.168.208.1;
};

options {
        listen-on port 53 {
                127.0.0.1;
                10.30.0.1;
                10.20.0.1;
                10.10.0.2;
        };
//      listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
//      forward only;
//      forwarders     { 8.8.8.8; 8.8.4.4; };
//      allow-query    { internal; dns; };
//      allow-transfer { dns; };
        recursion yes;

        dnssec-enable yes;
        dnssec-validation yes;
        dnssec-lookaside auto;

        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
        type hint;
        file "named.ca";
};


include "/etc/named.rfc1912.zones";
-------------------------
here comes a few local zones nothing else.
the truth is that i do not really like to play with this server as 10 people working at that office behind this firewall...
Comment 7 Adam Tkac 2010-08-20 06:56:38 EDT
This is weird, I'm not able to reproduce this issue. Are you able to stop named via `rndc stop` command or via `kill -TERM <named_pid>`? Or the only way is to kill named via sigkill?
Comment 8 Levente Farkas 2010-08-27 09:25:52 EDT
the only way was a sigkill.
Comment 9 Morten Stevens 2010-09-14 09:47:14 EDT
We have exactly the same problem.

If the named service is running more than a few hours, it's not possible to stop the named service.
Comment 10 Eddie Lania 2010-10-05 04:46:56 EDT
Exactly the same problem here and I also experience that RR are not updated so some services in the LAN that depend on it fail.

This is a serious issue indeed!
Comment 11 Adam Tkac 2010-10-14 11:56:08 EDT
I've finally reproduced this issue and the solution is to replace patch called "bind97-rh576906.patch" by improved version, called "bind97-rh623122.patch". I will attach the improved patch.
Comment 12 Adam Tkac 2010-10-14 11:57:31 EDT
Created attachment 453503 [details]
Proposed patch

Replace bind97-rh576906.patch by this patch.
Comment 17 Eddie Lania 2010-12-10 10:24:27 EST
This seems to be solved now, I've not encountered this issue anymore on any of my systems.
Comment 22 Martin Cermak 2011-03-18 06:01:24 EDT
According to https://beaker.engineering.redhat.com/jobs/63067 - VERIFIED.
Comment 23 Ryan Lerch 2011-04-12 19:16:33 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
it was impossible to stop named due deadlock in certain cases. Now it is possible.
Comment 24 Ryan Lerch 2011-05-03 18:32:39 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-it was impossible to stop named due deadlock in certain cases. Now it is possible.+under certain circumstances, "named" was entering a deadlock. Consequently, "named" could not be stopped using the "/etc/init.d/named stop". In this updated package, the deadlock no longer occurs, resolving this issue.
Comment 25 errata-xmlrpc 2011-05-19 08:58:13 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0541.html

Note You need to log in before you can comment on or make changes to this bug.