Bug 725743

Summary: Directory server hangs during unit tests
Product: [Retired] 389 Reporter: Martin Kosek <mkosek>
Component: Server - memberOf Plug-inAssignee: Nathan Kinder <nkinder>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.2.9CC: amsharma, dpal, edewata, jgalipea, nhosoi, nkinder, rmeggins
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 726136 (view as bug list) Environment:
Last Closed: 2015-12-07 16:48:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 434915, 639035, 708096, 726136    
Attachments:
Description Flags
Directory server logs
none
thread apply all bt full
none
Patch rmeggins: review+

Description Martin Kosek 2011-07-26 12:43:25 UTC
Created attachment 515272 [details]
Directory server logs

Description of problem:

Directory server hangs when FreeIPA unit tests are being run. It neither respond to any further LDAP operation nor cannot be shutdown via "service dirsrv stop".

Version-Release number of selected component (if applicable):

389-ds-base-1.2.9.0-1.fc15.x86_64

The problem was not reproduced on 32 bit version. The Directory Server does not hang with the previous version.

How reproducible:
The most reliable way to reproduce was to run the FreeIPA unit tests which run a lot of LDAP operations..

Steps to Reproduce:
1. Download FreeIPA sources on 64b Fedora 15: git clone git://git.fedorahosted.org/git/freeipa.git
2. Prepare current upstream RPMS: make rpms
3. Install freeipa-server in ./dist/rpms
4. Configure freeipa: ipa-server-install
5. Prepare unit tests configuration (replace host with your own setting)
# cat ~/.ipa/default.conf 
[global]
basedn = dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
realm = IDM.LAB.BOS.REDHAT.COM
domain = idm.lab.bos.redhat.com
server = vm-023.idm.lab.bos.redhat.com
xmlrpc_uri = https://vm-023.idm.lab.bos.redhat.com/ipa/xml
enable_ra = True
wait_for_attr = True
6. Kinit to installed FreeIPA: kinit admin
7. Prepare test certs: ./make-testcert
8. Run unit tests: ./make-test
  
Actual results:
During the test process, the Directory server stops responding.

Expected results:
Directory server does not hang.

Additional info:
Attaching logs with highest log level. I can provide already configured VM with reproducer if required.

Comment 1 Rich Megginson 2011-07-26 15:25:29 UTC
If you use the standard log levels, can you reproduce the problem?

Yes, I would like to be able to attach to the ns-slapd process with a debugger or strace or something like that so I can see what it's doing.

Comment 2 Rich Megginson 2011-07-26 17:10:44 UTC
Created attachment 515325 [details]
thread apply all bt full

Comment 3 Nathan Kinder 2011-07-26 17:23:32 UTC
Thread 30 looks to be the one that is deadlocked.  It looks like a modify operation comes in, which triggers an internal memberOf modify operation.  This internal modify operation then triggers the managed entry plug-in, which performs another internal modify operation.  This internal modify that is originated from the managed entry plug-in then triggers the memberOf plug-in.

The memberOf plug-in uses an operation lock that must be held by a thread when it is performing a memberOf operation.  We are able to detect when memberOf calls itself since you can see what plug-in initiated an internal operation.  The problem here is that the managed entry plug-in is initiating the internal operation, but we have no way of knowing that this thread already holds the memberOf operation lock.

Comment 4 Dmitri Pal 2011-07-26 17:57:36 UTC
So what is the solution?

Comment 5 Nathan Kinder 2011-07-26 18:12:11 UTC
(In reply to comment #4)
> So what is the solution?

I'm testing using PRMonitor for locking instead of a simple mutex.  This will allow re-entrant locking.  If it passes my memberOf tests, I'll attach a patch here so we can test it with the FreeIPA unit tests.

Comment 6 Nathan Kinder 2011-07-26 19:39:50 UTC
Created attachment 515354 [details]
Patch

This patch changed the memebrOf plug-in to use a PRMonitor instead of a mutex.  This has passed all of my memberOf tests, but I would like it tested out against the FreeIPA unit tests that triggered the deadlock.

Comment 7 Rich Megginson 2011-07-26 20:02:45 UTC
Comment on attachment 515354 [details]
Patch

Does this compile cleanly?  Do you need an #include <prmon.h>?

Comment 8 Nathan Kinder 2011-07-26 20:45:37 UTC
(In reply to comment #7)
> Comment on attachment 515354 [details]
> Patch
> 
> Does this compile cleanly?  Do you need an #include <prmon.h>?

Yes, it complies cleanly.  We include nspr.h in memberof.c, which includes prmon.h.

Comment 9 Rich Megginson 2011-07-26 22:05:21 UTC
Martin - I completed the ipa make-test with the patch - the server did not hang, but the test reported many errors.  Please review the test results - if you approve, we can push the patch and get a new build pushed to Fedora 15.

Comment 10 Martin Kosek 2011-07-27 07:57:31 UTC
Rich, thanks for successful investigation and also thanks to Nathan for the fix. The DS no longer hangs in FreeIPA unit test. These referred test errors are known and will be fixed in the future.

So ACK from me for the proposed patch.

Comment 11 Nathan Kinder 2011-07-27 15:34:26 UTC
Pushed to master.  Thanks to Rich and Martin for their reviews and testing!

Counting objects: 13, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 915 bytes, done.
Total 7 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   077544c..238b74d  master -> master

Comment 18 Amita Sharma 2011-09-14 06:30:00 UTC
Followed steps in README file in the tarball given in comment#17 :

Step 1:
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f config.ldif
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

adding new entry "cn=users,dc=example,dc=com"

adding new entry "cn=groups,dc=example,dc=com"

adding new entry "cn=inversegroups,dc=example,dc=com"

adding new entry "cn=Inverse Groups Template,dc=example,dc=com"

adding new entry "cn=Inverse Groups,cn=Managed Entries,cn=plugins,cn=config"

Step 2:
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f setup-data.ldif
adding new entry "uid=tuser1,cn=users,dc=example,dc=com"

adding new entry "uid=tuser2,cn=users,dc=example,dc=com"

adding new entry "uid=tuser3,cn=users,dc=example,dc=com"

adding new entry "uid=tuser4,cn=users,dc=example,dc=com"

adding new entry "uid=tuser5,cn=users,dc=example,dc=com"

adding new entry "uid=tuser6,cn=users,dc=example,dc=com"
.
..
..........

Steldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif 
[root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif
adding new entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

modifying entry "cn=mastergroup,dc=example,dc=com"

Server did not hang and operation performed successfully.
Hence marking as VERIFIED.