Bug 856269 - Race condition leads to crash during BIND reload
Race condition leads to crash during BIND reload
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: bind-dyndb-ldap (Show other bugs)
6.3
Unspecified Unspecified
medium Severity unspecified
: rc
: 6.4
Assigned To: Adam Tkac
Namita Soman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-11 11:19 EDT by Chris Hudson
Modified: 2013-02-24 10:23 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Race condition caused that the plugin could crash named process when received request to reload. Consequence: Unavailable DNS service. Fix: The plugin was patched. Result: Race condition during reload no longer occurs.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 03:58:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0359 normal SHIPPED_LIVE bind-dyndb-ldap bug fix and enhancement update 2013-02-20 15:53:11 EST

  None (edit)
Comment 2 Adam Tkac 2012-09-12 10:20:30 EDT
(In reply to comment #0)
> Let me know what else is needed.

Can you please send me /var/log/messages before named crashed? Thank you in advance.
Comment 3 Petr Spacek 2012-09-13 03:24:16 EDT
I identified the root cause and I work on the fix right now.
Comment 5 Petr Spacek 2012-09-14 03:16:17 EDT
Same bug was observed in Fedora:

https://bugzilla.redhat.com/show_bug.cgi?id=855939

and

https://bugzilla.redhat.com/show_bug.cgi?id=855938
Comment 9 Jenny Galipeau 2012-09-20 08:11:18 EDT
Please add steps to reproduce the bind crash.
Comment 11 Petr Spacek 2012-09-20 09:19:25 EDT
I added some steps to reproduce to the upstream ticket. There is nothing 100% sure in this case.
Comment 12 Petr Spacek 2012-09-21 05:02:50 EDT
Package version:
bind-dyndb-ldap-1.1.0-0.9.b1.el6_3.1.x86_64
Comment 14 Jenny Galipeau 2012-09-25 10:09:06 EDT
QE will verify sanity only without any steps to reproduce
Comment 15 Petr Spacek 2012-12-19 03:40:15 EST
Upstream ticket:
https://fedorahosted.org/bind-dyndb-ldap/ticket/101
Comment 16 Michael Gregg 2013-01-15 15:51:02 EST
This is a somewhat difficult bug to verify as the steps to reproduce this bug aren't very solid. I tried the following things to cause this failure, I encountered no problems.

[root@zippyvm4 ~]# kinit admin
Password for admin@TESTRELM.COM:

[root@zippyvm4 ~]# /etc/init.d/named restart
Stopping named: .[  OK  ]
Starting named: [  OK  ]

[root@zippyvm4 ~]# tail /var/log/messages
Jan 15 15:44:44 zippyvm4 named[22213]: zone testrelm.com/IN: sending notifies (serial 1358282684)
Jan 15 15:44:44 zippyvm4 named[22213]: zone 5.14.10.in-addr.arpa/IN: sending notifies (serial 1358282684)
Jan 15 15:44:49 zippyvm4 named[22213]: zone testrelm.com/IN: sending notifies (serial 1358282684)
Jan 15 15:44:49 zippyvm4 named[22213]: zone 5.14.10.in-addr.arpa/IN: sending notifies (serial 1358282684)

/etc/init.d/named restart &
/etc/init.d/nscd restart &

[root@zippyvm4 ~]# tail /var/log/messages
Jan 15 15:50:17 zippyvm4 named[24939]: zone testrelm.com/IN: sending notifies (serial 1358283017)
Jan 15 15:50:17 zippyvm4 named[24939]: zone 5.14.10.in-addr.arpa/IN: sending notifies (serial 1358283017)
Jan 15 15:50:17 zippyvm4 logger: 2013-01-15 15:50:17 /usr/bin/rhts-test-runner.sh 1212081 63720 hearbeat...
Jan 15 15:50:22 zippyvm4 named[24939]: zone testrelm.com/IN: sending notifies (serial 1358283017)
Jan 15 15:50:22 zippyvm4 named[24939]: zone 5.14.10.in-addr.arpa/IN: sending notifies (serial 1358283017)


Verified against ipa-server-3.0.0-9.el6.x86_64
Comment 19 errata-xmlrpc 2013-02-21 03:58:26 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0359.html
Comment 20 Petr Spacek 2013-02-24 10:23:53 EST
Public version of bug description follow:

Description of problem:
bind-9.8.2-0.10.rc1.el6_3.2.x86_64 segfault causes IPA replication to fail

Version-Release number of selected component (if applicable):
bind-9.8.2-0.10.rc1.el6_3.2.x86_64

How reproducible:
Random

Steps to Reproduce:
1. Install master and replica IPA environment
2. Watch bind segfault
  
Actual results:
bind segfaults

Expected results:
bind should not segfault

Additional info:
Core analysis result:

Issue might be coming out of thread 16:

---
(gdb) bt
#0  0x00007fcc5c994b02 in ldap_cache_enabled (cache=0xdededededededede) at cache.c:215
#1  0x00007fcc5c994efe in ldap_cache_getrdatalist (mctx=0x7fcc6ec222d0, cache=0xdededededededede, name=0x7fcc662c6d10, 
    rdatalist=0x7fcc662c6cf0) at cache.c:177
#2  0x00007fcc5c99b496 in ldapdb_rdatalist_get (mctx=0x7fcc6ec222d0, ldap_inst=0x7fcc5342ef10, name=0x7fcc662c6d10, origin=0x7fcc5d489320, 
    rdatalist=0x7fcc662c6cf0) at ldap_helper.c:1457
#3  0x00007fcc5c997b41 in find (db=0x7fcc5d489308, name=0x7fcc535f41a0, version=<value optimized out>, type=1, options=0, 
    now=<value optimized out>, nodep=0x7fcc662c73d8, foundname=0x7fcc535f41f0, rdataset=0x7fcc535f7240, sigrdataset=0x0)
    at ldap_driver.c:495
#4  0x00007fcc6d03c57f in query_find (client=<value optimized out>, event=0x0, qtype=1) at query.c:5474
#5  0x00007fcc6d042f6a in ns_query_start (client=0x7fcc4000be60) at query.c:7214
#6  0x00007fcc6d028846 in client_request (task=<value optimized out>, event=<value optimized out>) at client.c:1913
#7  0x00007fcc6ba072f8 in dispatch (uap=0x7fcc6cfb0010) at task.c:1012
#8  run (uap=0x7fcc6cfb0010) at task.c:1157
#9  0x00007fcc6b3bc851 in start_thread (arg=0x7fcc662c9700) at pthread_create.c:301
#10 0x00007fcc6a91f6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
---

---
(gdb) info threads
  19 Thread 0x7fcc5febf700 (LWP 29612)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  18 Thread 0x7fcc66cca700 (LWP 29601)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  17 Thread 0x7fcc658c8700 (LWP 29603)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 16 Thread 0x7fcc6cfec7c0 (LWP 29596)  0x00007fcc6a86ac54 in do_sigsuspend (set=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
  15 Thread 0x7fcc5f4be700 (LWP 29613)  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  14 Thread 0x7fcc680cc700 (LWP 29599)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  13 Thread 0x7fcc694ce700 (LWP 29597)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  12 Thread 0x7fcc5eabd700 (LWP 29614)  0x00007fcc6a91fcd3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
  11 Thread 0x7fcc64ec7700 (LWP 29604)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  10 Thread 0x7fcc68acd700 (LWP 29598)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  9 Thread 0x7fcc626c3700 (LWP 29608)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  8 Thread 0x7fcc61cc2700 (LWP 29609)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  7 Thread 0x7fcc612c1700 (LWP 29610)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6 Thread 0x7fcc644c6700 (LWP 29605)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  5 Thread 0x7fcc676cb700 (LWP 29600)  0x00007fcc5c994b02 in ldap_cache_enabled (cache=0xdededededededede) at cache.c:215
  4 Thread 0x7fcc608c0700 (LWP 29611)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 0x7fcc630c4700 (LWP 29607)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 0x7fcc63ac5700 (LWP 29606)  0x00007fcc6b3c3a2d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
  1 Thread 0x7fcc662c9700 (LWP 29602)  0x00007fcc5c994b02 in ldap_cache_enabled (cache=0xdededededededede) at cache.c:215
---

---
(gdb) thread 16
[Switching to thread 16 (Thread 0x7fcc6cfec7c0 (LWP 29596))]#0  0x00007fcc6a86ac54 in do_sigsuspend (set=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
63	  return INLINE_SYSCALL (rt_sigsuspend, 2, CHECK_SIGSET (set), _NSIG / 8);
---

---
(gdb) bt
#0  0x00007fcc6a86ac54 in do_sigsuspend (set=<value optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#1  __sigsuspend (set=<value optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:78
#2  0x00007fcc6ba0b5a4 in isc__app_ctxrun (ctx0=0x7fcc6bc30880) at app.c:680
#3  0x00007fcc6d033025 in main (argc=<value optimized out>, argv=0x7fff165d4b68) at ./main.c:1085

Note You need to log in before you can comment on or make changes to this bug.