Bug 1219568
| Summary: | recurring named crashes on IPA Server and Replica | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | jstephen | ||||||||||||
| Component: | bind-dyndb-ldap | Assignee: | Petr Spacek <pspacek> | ||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Namita Soman <nsoman> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | high | ||||||||||||||
| Version: | 6.8 | CC: | gparente, jstephen, mkosek, pspacek, pvoborni, rcritten, thozza, tlavigne | ||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | All | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | GSSProposed | ||||||||||||||
| Fixed In Version: | bind-dyndb-ldap-2.3-8.el6 | Doc Type: | Bug Fix | ||||||||||||
| Doc Text: |
Previously, the bind-dyndb-ldap plug-in incorrectly handled timeouts which occurred during LDAP operations. As a consequence, under very specific circumstances, the BIND daemon could terminate unexpectedly. With this update, bind-dyndb-ldap has been fixed to correctly handle timeouts during LDAP operations and the BIND daemon no longer crashes in this scenario.
|
Story Points: | --- | ||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2015-07-22 05:39:21 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
I tried to reproduce the issue by having BIND+IPA deployment. I used dig to resolve non-existent domain and requesting DNSSEC data. I was unable to make BIND assert. However I tested BIND with plain RBTDB and on missing RRSET (e.g. NSEC) it does not return NXDOMAIN (the find() method of the database implementation), but rather DNS_R_NXRRSET, which means, that the domain name exists, but the RRSET don't. This conforms to RFC 2136 section 3.2.1 and RFC 5395 section 2.3.. Therefore I think that the bind-dyndb-ldap plugin should return NXRRSET instead of NXDOMAIN, which should solve the assert problem. Also note that this has potentially security implications. If the issue is reproducible, that would mean, that any client in an environment with proper BIND+IPA setup would be able to DoS the DNS server by making query for non-existing domain name and requesting also DNSSEC data. (However I was so far unable to setup environment in which it would be possible to reproduce the issue). I think we have the possible root cause. The problem seems to be that LDAP query is timeouting. The situation is as follows: - client tries to resolve non-existing domain name from zone that is in IPA, with DNSSEC data - the code in BIND gets into the query_addwildcardproof() function to find NSEC record - Once BIND figures out, there is no NSEC, it tries to generate NSEC3 - this boils down to the call dns_db_find() in query.c:3340 - under the hood, bind-dyndb-ldap plugin is called. - in find() (ldap_driver.c:509) -> ldapdb_rdatalist_get() - in ldapdb_rdatalist_get() (ldap_helper.c:1926) -> ldap_query() - ldap_query() times out and returns ISC_R_TIMEDOUT - in ldapdb_rdatalist_get() the CHECK jumps to cleanup label and returns ISC_R_TIMEDOUT - in find() (line 514) if the result is not ISC_R_SUCCESS, then DNS_R_NXDOMAIN is used. - after all labels are skipped due to timeout, the result contains DNS_R_NXDOMAIN, FOR cycle ends and DNS_R_NXDOMAIN is returned on ldap_driver.c:551 - dns_db_find() in query.c:3340 returns DNS_R_NXDOMAIN - labels in the FOR loop are stripped out and once we get to the root zone -> BOOOM! assert The basic assumption is that LDAP connection needs to be timing out. The bottom line is that in case of TIMEOUT the find() should not return NXDOMAIN. Petr is working on reproducer. Created attachment 1034720 [details]
proposed patch
Created attachment 1035044 [details]
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm
Created attachment 1035045 [details]
bind-dyndb-ldap-2.3-7.hotfix1.el6.x86_64.rpm
I agree. Test builds are attached. We definitelly should fix the bug in bind-dyndb-ldap because we do not know what other unforeseen issues it might cause in future.
Created attachment 1035265 [details]
bind-dyndb-ldap-2.3-8.el6.x86_64.rpm
Final build for RHEL 6.7.
Created attachment 1035267 [details]
bind-dyndb-ldap-debuginfo-2.3-8.el6.x86_64.rpm
No regressions seen when running DNS tests. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1259.html |
Created attachment 1023211 [details] abrt-crash-1.tar.gz Description of problem: Named is crashing consistently on IPA master server and replica, every couple days. Attaching abrt directory of latest crashes $ egrep 'core dump|back trace' var/log/messages* var/log/messages:May 4 11:05:20 oma3lrhidm named[6564]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages:May 4 11:05:20 oma3lrhidm abrt[17312]: Saved core dump of pid 6564 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-11:05:20-6564 (222756864 bytes) var/log/messages-20150503:Apr 28 05:50:06 oma3lrhidm named[32571]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages-20150503:Apr 28 05:50:07 oma3lrhidm abrt[11402]: Saved core dump of pid 32571 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-28-05:50:06-32571 (178335744 bytes) var/log/messages-20150503:Apr 28 21:41:18 oma3lrhidm named[15118]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages-20150503:Apr 28 21:41:18 oma3lrhidm abrt[24689]: Saved core dump of pid 15118 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-28-21:41:18-15118 (178704384 bytes) var/log/messages-20150503:Apr 30 21:12:38 oma3lrhidm named[24802]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages-20150503:Apr 30 21:12:38 oma3lrhidm abrt[29001]: Saved core dump of pid 24802 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-04-30-21:12:38-24802 (217362432 bytes) var/log/messages-20150503:May 1 17:42:44 oma3lrhidm named[21142]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages-20150503:May 1 17:42:44 oma3lrhidm abrt[28805]: Saved core dump of pid 21142 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-01-17:42:44-21142 (212107264 bytes) $ egrep 'core dump|back trace' var/log/messages* var/log/messages:May 4 11:35:06 nrc3lrhidm named[6261]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages:May 4 11:35:07 nrc3lrhidm abrt[25199]: Saved core dump of pid 6261 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-11:35:06-6261 (220442624 bytes) var/log/messages:May 4 18:01:22 nrc3lrhidm named[19014]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages:May 4 18:01:23 nrc3lrhidm abrt[21405]: Saved core dump of pid 19014 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-04-18:01:22-19014 (190038016 bytes) var/log/messages:May 5 14:13:36 nrc3lrhidm named[5600]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace var/log/messages:May 5 14:13:37 nrc3lrhidm abrt[25180]: Saved core dump of pid 5600 (/usr/sbin/named) to /var/spool/abrt/ccpp-2015-05-05-14:13:36-5600 (209588224 bytes) Version-Release number of selected component (if applicable): bind-9.8.2-0.30.rc1.el6_6.2.x86_64 bind-dyndb-ldap-2.3-6.el6_6.x86_64 bind-libs-9.8.2-0.30.rc1.el6_6.2.x86_64 bind-utils-9.8.2-0.30.rc1.el6_6.2.x86_64 ipa-admintools-3.0.0-42.el6.x86_64 ipa-client-3.0.0-42.el6.x86_64 ipa-pki-ca-theme-9.0.3-7.el6.noarch ipa-pki-common-theme-9.0.3-7.el6.noarch ipa-python-3.0.0-42.el6.x86_64 ipa-server-3.0.0-42.el6.x86_64 ipa-server-selinux-3.0.0-42.el6.x86_64 How reproducible: Crash occurs often on both IPA master and replica, but no specific pattern noticed about how to reproduce Steps to Reproduce: 1. Unsure at this point, normal operation of IPA/named 2. 3. Actual results: Crash Expected results: No crash Additional info: Attaching abrt directory of latest crashes ---> abrt-crash-1.tar.gz May 4 18:01:22 nrc3lrhidm named[19014]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace May 4 18:01:22 nrc3lrhidm named[19014]: #0 0x7f8c269aafbf in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #1 0x7f8c253109fa in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #2 0x7f8c26222055 in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #3 0x7f8c269b2ab8 in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #4 0x7f8c269b861f in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #5 0x7f8c269bcb4a in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #6 0x7f8c269a1bd6 in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #7 0x7f8c2532f978 in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #8 0x7f8c24ce49d1 in ?? May 4 18:01:22 nrc3lrhidm named[19014]: #9 0x7f8c242448fd in ?? May 4 18:01:22 nrc3lrhidm named[19014]: exiting (due to assertion failure) Core was generated by `/usr/sbin/named -u named -4'. Program terminated with signal 6, Aborted. #0 0x00007f8c2418e625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) where #0 0x00007f8c2418e625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f8c2418fe05 in abort () at abort.c:92 #2 0x00007f8c253109ff in isc_assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at assertions.c:58 #3 0x00007f8c26222055 in dns_name_split (name=0x7f8c217dea30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7f8c217dea30) at name.c:2104 #4 0x00007f8c269b2ab8 in query_addwildcardproof (client=0x7f8c08034a50, db=0x7f8c26946e50, version=0x7f8c1ecd1458, name=0x7f8c06dff150, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339 #5 0x00007f8c269b861f in query_find (client=0x7f8c08034a50, event=0x0, qtype=<value optimized out>) at query.c:6254 #6 0x00007f8c269bcb4a in ns_query_start (client=0x7f8c08034a50) at query.c:7349 #7 0x00007f8c269a1bd6 in client_request (task=<value optimized out>, event=<value optimized out>) at client.c:1961 #8 0x00007f8c2532f978 in dispatch (uap=0x7f8c2692c010) at task.c:1012 #9 run (uap=0x7f8c2692c010) at task.c:1157 #10 0x00007f8c24ce49d1 in start_thread (arg=0x7f8c217e1700) at pthread_create.c:301 #11 0x00007f8c242448fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) frame 3 #3 0x00007f8c26222055 in dns_name_split (name=0x7f8c217dea30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7f8c217dea30) at name.c:2104 2104 REQUIRE(suffixlabels > 0); (gdb) print name $1 = (dns_name_t *) 0x7f8c217dea30 (gdb) frame 4 #4 0x00007f8c269b2ab8 in query_addwildcardproof (client=0x7f8c08034a50, db=0x7f8c26946e50, version=0x7f8c1ecd1458, name=0x7f8c06dff150, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339 3339 dns_name_split(cname, labels, NULL, cname); ---> abrt-crash-2.tar.gz May 6 07:30:19 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter May 6 07:30:19 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter May 6 07:30:20 oma3lrhidm named[30342]: LDAP query timed out. Try to adjust "timeout" parameter May 6 07:30:20 oma3lrhidm named[30342]: name.c:2104: REQUIRE(suffixlabels > 0) failed, back trace May 6 07:30:20 oma3lrhidm named[30342]: #0 0x7ff0958b6fbf in ?? May 6 07:30:20 oma3lrhidm named[30342]: #1 0x7ff09421c9fa in ?? May 6 07:30:20 oma3lrhidm named[30342]: #2 0x7ff09512e055 in ?? May 6 07:30:20 oma3lrhidm named[30342]: #3 0x7ff0958beab8 in ?? May 6 07:30:20 oma3lrhidm named[30342]: #4 0x7ff0958c461f in ?? May 6 07:30:20 oma3lrhidm named[30342]: #5 0x7ff0958c8b4a in ?? May 6 07:30:20 oma3lrhidm named[30342]: #6 0x7ff0958adbd6 in ?? May 6 07:30:20 oma3lrhidm named[30342]: #7 0x7ff09423b978 in ?? May 6 07:30:20 oma3lrhidm named[30342]: #8 0x7ff093bf09d1 in ?? May 6 07:30:20 oma3lrhidm named[30342]: #9 0x7ff0931508fd in ?? May 6 07:30:20 oma3lrhidm named[30342]: exiting (due to assertion failure) Core was generated by `/usr/sbin/named -u named -4'. Program terminated with signal 6, Aborted. #0 0x00007ff09309a625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) where #0 0x00007ff09309a625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007ff09309be05 in abort () at abort.c:92 #2 0x00007ff0958b71a4 in assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at ./main.c:219 #3 0x00007ff09421c9fa in isc_assertion_failed (file=<value optimized out>, line=<value optimized out>, type=<value optimized out>, cond=<value optimized out>) at assertions.c:57 #4 0x00007ff09512e055 in dns_name_split (name=0x7ff091aeca30, suffixlabels=0, prefix=<value optimized out>, suffix=0x7ff091aeca30) at name.c:2104 #5 0x00007ff0958beab8 in query_addwildcardproof (client=0x7ff07c1d3170, db=0x7ff095852e50, version=0x7ff08dbdd458, name=0x7ff07410a010, ispositive=isc_boolean_false, nodata=isc_boolean_false) at query.c:3339 #6 0x00007ff0958c461f in query_find (client=0x7ff07c1d3170, event=0x0, qtype=<value optimized out>) at query.c:6254 #7 0x00007ff0958c8b4a in ns_query_start (client=0x7ff07c1d3170) at query.c:7349 #8 0x00007ff0958adbd6 in client_request (task=<value optimized out>, event=<value optimized out>) at client.c:1961 #9 0x00007ff09423b978 in dispatch (uap=0x7ff095838010) at task.c:1012 #10 run (uap=0x7ff095838010) at task.c:1157 #11 0x00007ff093bf09d1 in start_thread (arg=0x7ff091aef700) at pthread_create.c:301 #12 0x00007ff0931508fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115