Bug 726136

Summary: Directory server hangs during unit tests
Product: Red Hat Enterprise Linux 6 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.2CC: amsharma, benl, dpal, edewata, jgalipea, jwest, mkosek, nhosoi, nkinder, rmeggins, shaines
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.8.7-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 725743 Environment:
Last Closed: 2011-12-06 17:55:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 725743    
Bug Blocks: 434915, 639035, 708096, 726606    

Description Nathan Kinder 2011-07-27 16:19:45 UTC
+++ This bug was initially created as a clone of Bug #725743 +++

Created attachment 515272 [details]
Directory server logs

Description of problem:

Directory server hangs when FreeIPA unit tests are being run. It neither respond to any further LDAP operation nor cannot be shutdown via "service dirsrv stop".

Version-Release number of selected component (if applicable):

389-ds-base-1.2.9.0-1.fc15.x86_64

The problem was not reproduced on 32 bit version. The Directory Server does not hang with the previous version.

How reproducible:
The most reliable way to reproduce was to run the FreeIPA unit tests which run a lot of LDAP operations..

Steps to Reproduce:
1. Download FreeIPA sources on 64b Fedora 15: git clone git://git.fedorahosted.org/git/freeipa.git
2. Prepare current upstream RPMS: make rpms
3. Install freeipa-server in ./dist/rpms
4. Configure freeipa: ipa-server-install
5. Prepare unit tests configuration (replace host with your own setting)
# cat ~/.ipa/default.conf 
[global]
basedn = dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
realm = IDM.LAB.BOS.REDHAT.COM
domain = idm.lab.bos.redhat.com
server = vm-023.idm.lab.bos.redhat.com
xmlrpc_uri = https://vm-023.idm.lab.bos.redhat.com/ipa/xml
enable_ra = True
wait_for_attr = True
6. Kinit to installed FreeIPA: kinit admin
7. Prepare test certs: ./make-testcert
8. Run unit tests: ./make-test
  
Actual results:
During the test process, the Directory server stops responding.

Expected results:
Directory server does not hang.

Additional info:
Attaching logs with highest log level. I can provide already configured VM with reproducer if required.

--- Additional comment from rmeggins on 2011-07-26 11:25:29 EDT ---

If you use the standard log levels, can you reproduce the problem?

Yes, I would like to be able to attach to the ns-slapd process with a debugger or strace or something like that so I can see what it's doing.

--- Additional comment from rmeggins on 2011-07-26 13:10:44 EDT ---

Created attachment 515325 [details]
thread apply all bt full

--- Additional comment from nkinder on 2011-07-26 13:23:32 EDT ---

Thread 30 looks to be the one that is deadlocked.  It looks like a modify operation comes in, which triggers an internal memberOf modify operation.  This internal modify operation then triggers the managed entry plug-in, which performs another internal modify operation.  This internal modify that is originated from the managed entry plug-in then triggers the memberOf plug-in.

The memberOf plug-in uses an operation lock that must be held by a thread when it is performing a memberOf operation.  We are able to detect when memberOf calls itself since you can see what plug-in initiated an internal operation.  The problem here is that the managed entry plug-in is initiating the internal operation, but we have no way of knowing that this thread already holds the memberOf operation lock.

--- Additional comment from dpal on 2011-07-26 13:57:36 EDT ---

So what is the solution?

--- Additional comment from nkinder on 2011-07-26 14:12:11 EDT ---

(In reply to comment #4)
> So what is the solution?

I'm testing using PRMonitor for locking instead of a simple mutex.  This will allow re-entrant locking.  If it passes my memberOf tests, I'll attach a patch here so we can test it with the FreeIPA unit tests.

--- Additional comment from nkinder on 2011-07-26 15:39:50 EDT ---

Created attachment 515354 [details]
Patch

This patch changed the memebrOf plug-in to use a PRMonitor instead of a mutex.  This has passed all of my memberOf tests, but I would like it tested out against the FreeIPA unit tests that triggered the deadlock.

--- Additional comment from rmeggins on 2011-07-26 16:02:45 EDT ---

Comment on attachment 515354 [details]
Patch

Does this compile cleanly?  Do you need an #include <prmon.h>?

--- Additional comment from nkinder on 2011-07-26 16:45:37 EDT ---

(In reply to comment #7)
> Comment on attachment 515354 [details]
> Patch
> 
> Does this compile cleanly?  Do you need an #include <prmon.h>?

Yes, it complies cleanly.  We include nspr.h in memberof.c, which includes prmon.h.

--- Additional comment from rmeggins on 2011-07-26 18:05:21 EDT ---

Martin - I completed the ipa make-test with the patch - the server did not hang, but the test reported many errors.  Please review the test results - if you approve, we can push the patch and get a new build pushed to Fedora 15.

--- Additional comment from mkosek on 2011-07-27 03:57:31 EDT ---

Rich, thanks for successful investigation and also thanks to Nathan for the fix. The DS no longer hangs in FreeIPA unit test. These referred test errors are known and will be fixed in the future.

So ACK from me for the proposed patch.

--- Additional comment from nkinder on 2011-07-27 11:34:26 EDT ---

Pushed to master.  Thanks to Rich and Martin for their reviews and testing!

Counting objects: 13, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 915 bytes, done.
Total 7 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   077544c..238b74d  master -> master

Comment 1 Jenny Severance 2011-07-27 16:28:52 UTC
Can you please adds steps to reproduce this issue in pure DS environment?  Thanks

Comment 5 Amita Sharma 2011-09-14 06:29:54 UTC
Followed steps in README file in the tarball given in comment#17 :

Step 1:
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f config.ldif
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

adding new entry "cn=users,dc=example,dc=com"

adding new entry "cn=groups,dc=example,dc=com"

adding new entry "cn=inversegroups,dc=example,dc=com"

adding new entry "cn=Inverse Groups Template,dc=example,dc=com"

adding new entry "cn=Inverse Groups,cn=Managed Entries,cn=plugins,cn=config"

Step 2:
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f setup-data.ldif
adding new entry "uid=tuser1,cn=users,dc=example,dc=com"

adding new entry "uid=tuser2,cn=users,dc=example,dc=com"

adding new entry "uid=tuser3,cn=users,dc=example,dc=com"

adding new entry "uid=tuser4,cn=users,dc=example,dc=com"

adding new entry "uid=tuser5,cn=users,dc=example,dc=com"

adding new entry "uid=tuser6,cn=users,dc=example,dc=com"
.
..
..........

Steldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif 
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif
adding new entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

Server did not hang and operation performed successfully.
Hence marking as VERIFIED.

Comment 6 errata-xmlrpc 2011-12-06 17:55:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1711.html