Bug 623122

Summary: can't stop bind
Product: Red Hat Enterprise Linux 6 Reporter: Levente Farkas <lfarkas>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED ERRATA QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: azelinka, mishu, mstevens, ovasik
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
under certain circumstances, "named" was entering a deadlock. Consequently, "named" could not be stopped using the "/etc/init.d/named stop". In this updated package, the deadlock no longer occurs, resolving this issue.
Story Points: ---
Clone Of:
: 643102 (view as bug list) Environment:
Last Closed: 2011-05-19 12:58:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 672514    
Bug Blocks:    
Attachments:
Description Flags
Proposed patch none

Description Levente Farkas 2010-08-11 11:57:25 UTC
referring to #622785
that currently it's not possible to stop named. ie
i've got only:
----------------------------------
# /etc/init.d/named stop
Stopping named: .............                              [FAILED]
# ps axuf|grep named
root     13140  0.0  0.0 103148   808 pts/0    S+   11:03   0:00          \_
grep named
named    12493  0.0  0.5 240372 19900 ?        Ssl  10:52   0:00
/usr/sbin/named -u named
----------------------------------
the only way is kill -9.
and the gdb output is:
----------------------------------
Program received signal SIGTERM, Terminated.
0x00007f1490abbd64 in sigsuspend () from /lib64/libc.so.6
(gdb) t a a bt

Thread 5 (Thread 0x7f148f53b710 (LWP 1425)):
#0  0x00007f14915fd43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a3fcf1 in dispatch (uap=0x7f1492fc4010) at task.c:961
#2  run (uap=0x7f1492fc4010) at task.c:1158
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f148eb3a710 (LWP 1426)):
#0  0x00007f14915fd43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a3fcf1 in dispatch (uap=0x7f1492fc4010) at task.c:961
#2  run (uap=0x7f1492fc4010) at task.c:1158
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f148e139710 (LWP 1427)):
#0  0x00007f14915fd7a9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1491a54d4e in isc_condition_waituntil (c=0x7f1492fc5078, m=0x7f1492fc5028, t=0x7f1492fc506c) at condition.c:59
#2  0x00007f1491a4239d in run (uap=0x7f1492fc5010) at timer.c:822
#3  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f148d738710 (LWP 1428)):
#0  0x00007f1490b6ab13 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1491a51e44 in watcher (uap=0x7f1492fc7010) at socket.c:3695
#2  0x00007f14915f97e1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f1490b6a51d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f14930007c0 (LWP 1424)):
#0  0x00007f1490abbd64 in sigsuspend () from /lib64/libc.so.6
#1  0x00007f1491a43e34 in isc__app_ctxrun (ctx0=0x7f1491c68b60) at app.c:680
#2  0x00007f149303dd3f in main (argc=<value optimized out>, argv=0x7fff3a62af08) at ./main.c:1031
----------------------------------

Comment 2 RHEL Program Management 2010-08-11 12:18:30 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Levente Farkas 2010-08-17 09:42:55 UTC
ok now bind simple unusable! we can't stop it and stop working randomly. imho it's a serious bug since it a basic service without it nothing is working!

Comment 4 Levente Farkas 2010-08-17 09:56:42 UTC
it's turn out to happened when i set forwarders in the config file.

Comment 5 Adam Tkac 2010-08-17 11:43:32 UTC
If I understand correctly when you don't set forwarders everything is OK? Would it be possible to check if it's something strange in the system log, please? Also try to disable DNSSEC validation (via `rndc validation off`, for example) and report if it helps, please.

Comment 6 Levente Farkas 2010-08-17 12:47:06 UTC
here is our named.conf. if you uncomment the the forward lines then it's happened on our firewall (with two interface external dhcp, internal static bridged network 10.30.0.1, plus 2 tun device for openvpn connection 10.20.0.1, 10.10.0.2)
-------------------------
acl internal {
        10.30.0.0/24;
        192.168.0.0;
        10.20.0.1/24;
        10.10.0.2/24;
};

acl dns {
        127.0.0.1;
        10.30.0.1;
        192.168.208.1;
};

options {
        listen-on port 53 {
                127.0.0.1;
                10.30.0.1;
                10.20.0.1;
                10.10.0.2;
        };
//      listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
//      forward only;
//      forwarders     { 8.8.8.8; 8.8.4.4; };
//      allow-query    { internal; dns; };
//      allow-transfer { dns; };
        recursion yes;

        dnssec-enable yes;
        dnssec-validation yes;
        dnssec-lookaside auto;

        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
        type hint;
        file "named.ca";
};


include "/etc/named.rfc1912.zones";
-------------------------
here comes a few local zones nothing else.
the truth is that i do not really like to play with this server as 10 people working at that office behind this firewall...

Comment 7 Adam Tkac 2010-08-20 10:56:38 UTC
This is weird, I'm not able to reproduce this issue. Are you able to stop named via `rndc stop` command or via `kill -TERM <named_pid>`? Or the only way is to kill named via sigkill?

Comment 8 Levente Farkas 2010-08-27 13:25:52 UTC
the only way was a sigkill.

Comment 9 Morten Stevens 2010-09-14 13:47:14 UTC
We have exactly the same problem.

If the named service is running more than a few hours, it's not possible to stop the named service.

Comment 10 Eddie Lania 2010-10-05 08:46:56 UTC
Exactly the same problem here and I also experience that RR are not updated so some services in the LAN that depend on it fail.

This is a serious issue indeed!

Comment 11 Adam Tkac 2010-10-14 15:56:08 UTC
I've finally reproduced this issue and the solution is to replace patch called "bind97-rh576906.patch" by improved version, called "bind97-rh623122.patch". I will attach the improved patch.

Comment 12 Adam Tkac 2010-10-14 15:57:31 UTC
Created attachment 453503 [details]
Proposed patch

Replace bind97-rh576906.patch by this patch.

Comment 17 Eddie Lania 2010-12-10 15:24:27 UTC
This seems to be solved now, I've not encountered this issue anymore on any of my systems.

Comment 22 Martin Cermak 2011-03-18 10:01:24 UTC
According to https://beaker.engineering.redhat.com/jobs/63067 - VERIFIED.

Comment 23 Ryan Lerch 2011-04-12 23:16:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
it was impossible to stop named due deadlock in certain cases. Now it is possible.

Comment 24 Ryan Lerch 2011-05-03 22:32:39 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-it was impossible to stop named due deadlock in certain cases. Now it is possible.+under certain circumstances, "named" was entering a deadlock. Consequently, "named" could not be stopped using the "/etc/init.d/named stop". In this updated package, the deadlock no longer occurs, resolving this issue.

Comment 25 errata-xmlrpc 2011-05-19 12:58:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0541.html