Bug 1219568 - recurring named crashes on IPA Server and Replica
Summary: recurring named crashes on IPA Server and Replica
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: bind-dyndb-ldap
Version: 6.8
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Petr Spacek
QA Contact: Namita Soman
URL:
Whiteboard: GSSProposed
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-07 15:59 UTC by jstephen
Modified: 2015-07-22 05:39 UTC (History)
8 users (show)

(edit)
Previously, the bind-dyndb-ldap plug-in incorrectly handled timeouts which occurred during LDAP operations. As a consequence, under very specific circumstances, the BIND daemon could terminate unexpectedly. With this update, bind-dyndb-ldap has been fixed to correctly handle timeouts during LDAP operations and the BIND daemon no longer crashes in this scenario.
Clone Of:
(edit)
Last Closed: 2015-07-22 05:39:21 UTC


Attachments (Terms of Use)
proposed patch (557 bytes, patch)
2015-06-04 13:27 UTC, Petr Spacek
no flags Details | Diff
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm (168.04 KB, application/octet-stream)
2015-06-05 07:11 UTC, Petr Spacek
no flags Details
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm (70.74 KB, application/octet-stream)
2015-06-05 07:12 UTC, Petr Spacek
no flags Details
bind-dyndb-ldap-2.3-8.el6.x86_64.rpm (70.70 KB, application/octet-stream)
2015-06-05 13:14 UTC, Petr Spacek
no flags Details
bind-dyndb-ldap-debuginfo-2.3-8.el6.x86_64.rpm (168.01 KB, application/octet-stream)
2015-06-05 13:15 UTC, Petr Spacek
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1259 normal SHIPPED_LIVE bind-dyndb-ldap bug fix update 2015-07-20 17:49:53 UTC

Description jstephen 2015-05-07 15:59:38 UTC
Created attachment 1023211 [details]
abrt-crash-1.tar.gz

Description of problem:
Named is crashing consistently on IPA master server and replica, every couple days. Attaching abrt directory of latest crashes 


$ egrep 'core dump|back trace' var/log/messages*
var/log/messages:May  4 11:05:20 oma3lrhidm named[6564]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages:May  4 11:05:20 oma3lrhidm abrt[17312]: Saved core dump of pid 6564 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-11:05:20-6564 (222756864 bytes)
var/log/messages-20150503:Apr 28 05:50:06 oma3lrhidm named[32571]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages-20150503:Apr 28 05:50:07 oma3lrhidm abrt[11402]: Saved core dump of pid 32571 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-28-05:50:06-32571 (178335744 bytes)
var/log/messages-20150503:Apr 28 21:41:18 oma3lrhidm named[15118]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages-20150503:Apr 28 21:41:18 oma3lrhidm abrt[24689]: Saved core dump of pid 15118 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-28-21:41:18-15118 (178704384 bytes)
var/log/messages-20150503:Apr 30 21:12:38 oma3lrhidm named[24802]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages-20150503:Apr 30 21:12:38 oma3lrhidm abrt[29001]: Saved core dump of pid 24802 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-30-21:12:38-24802 (217362432 bytes)
var/log/messages-20150503:May  1 17:42:44 oma3lrhidm named[21142]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages-20150503:May  1 17:42:44 oma3lrhidm abrt[28805]: Saved core dump of pid 21142 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-01-17:42:44-21142 (212107264 bytes)


$ egrep 'core dump|back trace' var/log/messages*
var/log/messages:May  4 11:35:06 nrc3lrhidm named[6261]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages:May  4 11:35:07 nrc3lrhidm abrt[25199]: Saved core dump of pid 6261 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-11:35:06-6261 (220442624 bytes)
var/log/messages:May  4 18:01:22 nrc3lrhidm named[19014]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages:May  4 18:01:23 nrc3lrhidm abrt[21405]: Saved core dump of pid 19014 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-18:01:22-19014 (190038016 bytes)
var/log/messages:May  5 14:13:36 nrc3lrhidm named[5600]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
var/log/messages:May  5 14:13:37 nrc3lrhidm abrt[25180]: Saved core dump of pid 5600 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-05-14:13:36-5600 (209588224 bytes)


Version-Release number of selected component (if applicable):
bind-9.8.2-0.30.rc1.el6_6.2.x86_64
bind-dyndb-ldap-2.3-6.el6_6.x86_64
bind-libs-9.8.2-0.30.rc1.el6_6.2.x86_64
bind-utils-9.8.2-0.30.rc1.el6_6.2.x86_64
ipa-admintools-3.0.0-42.el6.x86_64
ipa-client-3.0.0-42.el6.x86_64
ipa-pki-ca-theme-9.0.3-7.el6.noarch
ipa-pki-common-theme-9.0.3-7.el6.noarch
ipa-python-3.0.0-42.el6.x86_64
ipa-server-3.0.0-42.el6.x86_64
ipa-server-selinux-3.0.0-42.el6.x86_64


How reproducible:
Crash occurs often on both IPA master and replica, but no specific pattern noticed about how to reproduce

Steps to Reproduce:
1. Unsure at this point, normal operation of IPA/named
2.
3.

Actual results:
Crash

Expected results:
No crash

Additional info:
Attaching abrt directory of latest crashes 

---> abrt-crash-1.tar.gz

        May  4 18:01:22 nrc3lrhidm named[19014]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
        May  4 18:01:22 nrc3lrhidm named[19014]: #0 0x7f8c269aafbf in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #1 0x7f8c253109fa in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #2 0x7f8c26222055 in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #3 0x7f8c269b2ab8 in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #4 0x7f8c269b861f in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #5 0x7f8c269bcb4a in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #6 0x7f8c269a1bd6 in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #7 0x7f8c2532f978 in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #8 0x7f8c24ce49d1 in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: #9 0x7f8c242448fd in ??
        May  4 18:01:22 nrc3lrhidm named[19014]: exiting (due to assertion failure)


        Core was generated by `/usr/sbin/named -u named -4'.
        Program terminated with signal 6, Aborted.
        #0  0x00007f8c2418e625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
        (gdb) where
        #0  0x00007f8c2418e625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        #1  0x00007f8c2418fe05 in abort () at abort.c:92
        #2  0x00007f8c253109ff in isc_assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at assertions.c:58
        #3  0x00007f8c26222055 in dns_name_split (name=0x7f8c217dea30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7f8c217dea30) at name.c:2104
        #4  0x00007f8c269b2ab8 in query_addwildcardproof (client=0x7f8c08034a50, db=0x7f8c26946e50, version=0x7f8c1ecd1458, name=0x7f8c06dff150, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339
        #5  0x00007f8c269b861f in query_find (client=0x7f8c08034a50, event=0x0, qtype=<value optimized out>) at query.c:6254
        #6  0x00007f8c269bcb4a in ns_query_start (client=0x7f8c08034a50) at query.c:7349
        #7  0x00007f8c269a1bd6 in client_request (task=<value optimized out>, event=<value optimized out>) at client.c:1961
        #8  0x00007f8c2532f978 in dispatch (uap=0x7f8c2692c010) at task.c:1012
        #9  run (uap=0x7f8c2692c010) at task.c:1157
        #10 0x00007f8c24ce49d1 in start_thread (arg=0x7f8c217e1700) at pthread_create.c:301
        #11 0x00007f8c242448fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
        (gdb) frame 3
        #3  0x00007f8c26222055 in dns_name_split (name=0x7f8c217dea30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7f8c217dea30) at name.c:2104
        2104		REQUIRE(suffixlabels > 0);
        (gdb) print name
        $1 = (dns_name_t *) 0x7f8c217dea30
        (gdb) frame 4
        #4  0x00007f8c269b2ab8 in query_addwildcardproof (client=0x7f8c08034a50, db=0x7f8c26946e50, version=0x7f8c1ecd1458, name=0x7f8c06dff150, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339
        3339				dns_name_split(cname, labels, NULL, cname);
        
        
---> abrt-crash-2.tar.gz

        May  6 07:30:19 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter
        May  6 07:30:19 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter
        May  6 07:30:20 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter
        May  6 07:30:20 oma3lrhidm named[30342]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace
        May  6 07:30:20 oma3lrhidm named[30342]: #0 0x7ff0958b6fbf in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #1 0x7ff09421c9fa in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #2 0x7ff09512e055 in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #3 0x7ff0958beab8 in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #4 0x7ff0958c461f in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #5 0x7ff0958c8b4a in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #6 0x7ff0958adbd6 in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #7 0x7ff09423b978 in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #8 0x7ff093bf09d1 in ??
        May  6 07:30:20 oma3lrhidm named[30342]: #9 0x7ff0931508fd in ??
        May  6 07:30:20 oma3lrhidm named[30342]: exiting (due to assertion failure)


        Core was generated by `/usr/sbin/named -u named -4'.
        Program terminated with signal 6, Aborted.
        #0  0x00007ff09309a625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
        (gdb) where
        #0  0x00007ff09309a625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        #1  0x00007ff09309be05 in abort () at abort.c:92
        #2  0x00007ff0958b71a4 in assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at ./main.c:219
        #3  0x00007ff09421c9fa in isc_assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at assertions.c:57
        #4  0x00007ff09512e055 in dns_name_split (name=0x7ff091aeca30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7ff091aeca30) at name.c:2104
        #5  0x00007ff0958beab8 in query_addwildcardproof (client=0x7ff07c1d3170, db=0x7ff095852e50, version=0x7ff08dbdd458, name=0x7ff07410a010, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339
        #6  0x00007ff0958c461f in query_find (client=0x7ff07c1d3170, event=0x0, qtype=<value optimized out>) at query.c:6254
        #7  0x00007ff0958c8b4a in ns_query_start (client=0x7ff07c1d3170) at query.c:7349
        #8  0x00007ff0958adbd6 in client_request (task=<value optimized out>, event=<value optimized out>) at client.c:1961
        #9  0x00007ff09423b978 in dispatch (uap=0x7ff095838010) at task.c:1012
        #10 run (uap=0x7ff095838010) at task.c:1157
        #11 0x00007ff093bf09d1 in start_thread (arg=0x7ff091aef700) at pthread_create.c:301
        #12 0x00007ff0931508fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 11 Tomáš Hozza 🤓 2015-06-03 14:07:21 UTC
I tried to reproduce the issue by having BIND+IPA deployment. I used dig to resolve non-existent domain and requesting DNSSEC data. I was unable to make BIND assert.

However I tested BIND with plain RBTDB and on missing RRSET (e.g. NSEC) it does not return NXDOMAIN (the find() method of the database implementation), but rather DNS_R_NXRRSET, which means, that the domain name exists, but the RRSET don't.

This conforms to RFC 2136 section 3.2.1 and RFC 5395 section 2.3.. Therefore I think that the bind-dyndb-ldap plugin should return NXRRSET instead of NXDOMAIN, which should solve the assert problem.

Also note that this has potentially security implications. If the issue is reproducible, that would mean, that any client in an environment with proper BIND+IPA setup would be able to DoS the DNS server by making query for non-existing domain name and requesting also DNSSEC data. (However I was so far unable to setup environment in which it would be possible to reproduce the issue).

Comment 15 Tomáš Hozza 🤓 2015-06-04 12:02:56 UTC
I think we have the possible root cause. The problem seems to be that LDAP query is timeouting.

The situation is as follows:
- client tries to resolve non-existing domain name from zone that is in IPA, with DNSSEC data
- the code in BIND gets into the query_addwildcardproof() function to find NSEC record
- Once BIND figures out, there is no NSEC, it tries to generate NSEC3
- this boils down to the call dns_db_find() in query.c:3340
- under the hood, bind-dyndb-ldap plugin is called.
- in find() (ldap_driver.c:509) -> ldapdb_rdatalist_get()
- in ldapdb_rdatalist_get() (ldap_helper.c:1926) -> ldap_query()
- ldap_query() times out and returns ISC_R_TIMEDOUT
- in ldapdb_rdatalist_get() the CHECK jumps to cleanup label and returns ISC_R_TIMEDOUT
- in find() (line 514) if the result is not ISC_R_SUCCESS, then DNS_R_NXDOMAIN is used.
- after all labels are skipped due to timeout, the result contains DNS_R_NXDOMAIN, FOR cycle ends and DNS_R_NXDOMAIN is returned on ldap_driver.c:551
- dns_db_find() in query.c:3340 returns DNS_R_NXDOMAIN
- labels in the FOR loop are stripped out and once we get to the root zone -> BOOOM! assert

The basic assumption is that LDAP connection needs to be timing out.

The bottom line is that in case of TIMEOUT the find() should not return NXDOMAIN. Petr is working on reproducer.

Comment 16 Petr Spacek 2015-06-04 13:27:43 UTC
Created attachment 1034720 [details]
proposed patch

Comment 23 Petr Spacek 2015-06-05 07:11:56 UTC
Created attachment 1035044 [details]
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm

Comment 24 Petr Spacek 2015-06-05 07:12:50 UTC
Created attachment 1035045 [details]
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm

I agree. Test builds are attached. We definitelly should fix the bug in bind-dyndb-ldap because we do not know what other unforeseen issues it might cause in future.

Comment 27 Petr Spacek 2015-06-05 13:14:30 UTC
Created attachment 1035265 [details]
bind-dyndb-ldap-2.3-8.el6.x86_64.rpm

Final build for RHEL 6.7.

Comment 28 Petr Spacek 2015-06-05 13:15:00 UTC
Created attachment 1035267 [details]
bind-dyndb-ldap-debuginfo-2.3-8.el6.x86_64.rpm

Comment 30 Namita Soman 2015-06-24 17:23:09 UTC
No regressions seen when running DNS tests.

Comment 32 errata-xmlrpc 2015-07-22 05:39:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1259.html


Note You need to log in before you can comment on or make changes to this bug.