Bug 727856

Summary: bind-dyndb-ldap: race condition in semaphore_wait() function
Product: Red Hat Enterprise Linux 6 Reporter: Adam Tkac <atkac>
Component: bind-dyndb-ldapAssignee: Adam Tkac <atkac>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: benl, linux, martin_foster, mgregg, ovasik, rvokal
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: bind-dyndb-ldap-0.2.0-3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 734003 (view as bug list) Environment:
Last Closed: 2011-12-06 17:57:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 734003    
Attachments:
Description Flags
Proposed patch
none
pstack of hung named none

Description Adam Tkac 2011-08-03 12:38:09 UTC
Description of problem:
Current implementation of the semaphore_wait() function is not fully thread-safe

Version-Release number of selected component (if applicable):
bind-dyndb-ldap-0.2.0-1.el6

How reproducible:
sometimes, when server is under heavy load

Steps to Reproduce:
1. send many queries for RRs which authoritative zones are served via bind-dyndb-ldap plugin
2. wait for server lockup
  
Actual results:
all server's worker threads are blocked in semaphore_wait()

Expected results:
thread-safe semaphore_wait()

Comment 1 Adam Tkac 2011-08-03 12:49:28 UTC
Created attachment 516506 [details]
Proposed patch

Comment 2 Phil Anderson 2011-08-07 00:56:58 UTC
Created attachment 517022 [details]
pstack of hung named

I recently upgraded DNS my server from an older dual core core CPU to a quad core Xeon E3 and now named locks up after the first few queries.  Stack trace attached.

I was able to work around the problem by reducing the number of worker threads named starts by adding the following line to /etc/sysconfig/named:
OPTIONS="-n 1"

Comment 3 Adam Tkac 2011-08-08 08:37:28 UTC
(In reply to comment #2)
> Created attachment 517022 [details]
> pstack of hung named
> 
> I recently upgraded DNS my server from an older dual core core CPU to a quad
> core Xeon E3 and now named locks up after the first few queries.  Stack trace
> attached.
> 
> I was able to work around the problem by reducing the number of worker threads
> named starts by adding the following line to /etc/sysconfig/named:
> OPTIONS="-n 1"

Which version of bind, bind-libs and bind-dyndb-ldap do you use, please?

Comment 6 Martin Foster 2011-08-22 06:25:03 UTC
I was experiencing the same semaphore error as described in the freeipa-users list.  Other than serving records for IPA, my bind install is also an authoritative DNS server.

I rebuilt bind + bind-dyndb-ldap from Adam's proposed patches:
bind-dyndb-ldap-0.2.0-1.el6.1.src.rpm; and
bind-9.7.3-2.el6_1.P3.2.5.rh725577.src.rpm

The resolver has now been running for 6+ hours, where previously it would hang on the semaphore issue within an hour.

Comment 7 Michael Gregg 2011-11-08 18:25:04 UTC
Given that all of the servers we have in QA have not been locking up, even under load, and that this patch was submitted quite a while ago, I am going to mark this bug as verified.

Verified against:
bind-dyndb-ldap-0.2.0-7.el6.x86_64
ipa-server-2.1.3-8.el6.x86_64

Comment 8 errata-xmlrpc 2011-12-06 17:57:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1715.html