Created attachment 515272 [details] Directory server logs Description of problem: Directory server hangs when FreeIPA unit tests are being run. It neither respond to any further LDAP operation nor cannot be shutdown via "service dirsrv stop". Version-Release number of selected component (if applicable): 389-ds-base-1.2.9.0-1.fc15.x86_64 The problem was not reproduced on 32 bit version. The Directory Server does not hang with the previous version. How reproducible: The most reliable way to reproduce was to run the FreeIPA unit tests which run a lot of LDAP operations.. Steps to Reproduce: 1. Download FreeIPA sources on 64b Fedora 15: git clone git://git.fedorahosted.org/git/freeipa.git 2. Prepare current upstream RPMS: make rpms 3. Install freeipa-server in ./dist/rpms 4. Configure freeipa: ipa-server-install 5. Prepare unit tests configuration (replace host with your own setting) # cat ~/.ipa/default.conf [global] basedn = dc=idm,dc=lab,dc=bos,dc=redhat,dc=com realm = IDM.LAB.BOS.REDHAT.COM domain = idm.lab.bos.redhat.com server = vm-023.idm.lab.bos.redhat.com xmlrpc_uri = https://vm-023.idm.lab.bos.redhat.com/ipa/xml enable_ra = True wait_for_attr = True 6. Kinit to installed FreeIPA: kinit admin 7. Prepare test certs: ./make-testcert 8. Run unit tests: ./make-test Actual results: During the test process, the Directory server stops responding. Expected results: Directory server does not hang. Additional info: Attaching logs with highest log level. I can provide already configured VM with reproducer if required.
If you use the standard log levels, can you reproduce the problem? Yes, I would like to be able to attach to the ns-slapd process with a debugger or strace or something like that so I can see what it's doing.
Created attachment 515325 [details] thread apply all bt full
Thread 30 looks to be the one that is deadlocked. It looks like a modify operation comes in, which triggers an internal memberOf modify operation. This internal modify operation then triggers the managed entry plug-in, which performs another internal modify operation. This internal modify that is originated from the managed entry plug-in then triggers the memberOf plug-in. The memberOf plug-in uses an operation lock that must be held by a thread when it is performing a memberOf operation. We are able to detect when memberOf calls itself since you can see what plug-in initiated an internal operation. The problem here is that the managed entry plug-in is initiating the internal operation, but we have no way of knowing that this thread already holds the memberOf operation lock.
So what is the solution?
(In reply to comment #4) > So what is the solution? I'm testing using PRMonitor for locking instead of a simple mutex. This will allow re-entrant locking. If it passes my memberOf tests, I'll attach a patch here so we can test it with the FreeIPA unit tests.
Created attachment 515354 [details] Patch This patch changed the memebrOf plug-in to use a PRMonitor instead of a mutex. This has passed all of my memberOf tests, but I would like it tested out against the FreeIPA unit tests that triggered the deadlock.
Comment on attachment 515354 [details] Patch Does this compile cleanly? Do you need an #include <prmon.h>?
(In reply to comment #7) > Comment on attachment 515354 [details] > Patch > > Does this compile cleanly? Do you need an #include <prmon.h>? Yes, it complies cleanly. We include nspr.h in memberof.c, which includes prmon.h.
Martin - I completed the ipa make-test with the patch - the server did not hang, but the test reported many errors. Please review the test results - if you approve, we can push the patch and get a new build pushed to Fedora 15.
Rich, thanks for successful investigation and also thanks to Nathan for the fix. The DS no longer hangs in FreeIPA unit test. These referred test errors are known and will be fixed in the future. So ACK from me for the proposed patch.
Pushed to master. Thanks to Rich and Martin for their reviews and testing! Counting objects: 13, done. Delta compression using up to 2 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 915 bytes, done. Total 7 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 077544c..238b74d master -> master
Followed steps in README file in the tarball given in comment#17 : Step 1: [root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f config.ldif modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config" adding new entry "cn=users,dc=example,dc=com" adding new entry "cn=groups,dc=example,dc=com" adding new entry "cn=inversegroups,dc=example,dc=com" adding new entry "cn=Inverse Groups Template,dc=example,dc=com" adding new entry "cn=Inverse Groups,cn=Managed Entries,cn=plugins,cn=config" Step 2: [root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f setup-data.ldif adding new entry "uid=tuser1,cn=users,dc=example,dc=com" adding new entry "uid=tuser2,cn=users,dc=example,dc=com" adding new entry "uid=tuser3,cn=users,dc=example,dc=com" adding new entry "uid=tuser4,cn=users,dc=example,dc=com" adding new entry "uid=tuser5,cn=users,dc=example,dc=com" adding new entry "uid=tuser6,cn=users,dc=example,dc=com" . .. .......... Steldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif [root@snmaptest 725743]# ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -f hang.ldif adding new entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" modifying entry "cn=mastergroup,dc=example,dc=com" Server did not hang and operation performed successfully. Hence marking as VERIFIED.