The NSPR RW lock implementation does not safely allow re-entrant use of reader locks. If a writer lock is waiting, all reader locks are blocked. This includes threads that already hold a reader lock and are trying to obtain another reader lock. This leads to a deadlock. This issue has been reported to the NSPR developers, but they are hesitant to fix it. NSPR does not currently keep track of the threads that own locks, so there's no way for it to differentiate a thread asking for it's first reader lock and one who already holds a reader lock. The NSPR developers are hesitant to add this as they feel it would degrade performance. POSIX RW locks safely allow re-entrant reader locks to be used. We should use the POSIX implementation to avoid deadlocks in ns-slapd, as we do have areas where we use reader locks in a re-entrant fashion. To switch the RW lock implementation, we need to refactor the 389-ds-base code to use a new slapi_rwlock_* API anywhere we use RW locks. We currently call PR_RWLock_*() functions from many places within the code. The slapi_rwlock_* API should be able to switch implementations between NSPR RW locks and POSIX RW locks based on defines. We can then add a configure test to use POSIX RW locks if available, with an override switch to use NSPR locks if that is needed on certain platforms.
Created attachment 518096 [details] rwlock-test
Created attachment 518097 [details] stack trace from ipa update
Created attachment 518567 [details] Patch
Created attachment 518711 [details] Revised Patch This correct some search-and-replace errors in the previous patch.
Created attachment 518717 [details] Revised Patch
Pushed to master. Thanks to Noriko for hew review! Counting objects: 151, done. Delta compression using up to 2 threads. Compressing objects: 100% (76/76), done. Writing objects: 100% (76/76), 53.22 KiB, done. Total 76 (delta 70), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git a150e8e..f9b199e master -> master
*** Bug 528567 has been marked as a duplicate of this bug. ***
Upstream ticket: https://fedorahosted.org/389/ticket/247
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, 389 Directory Server used the Netscape Portable Runtime (NSPR) implementation of the read/write locking mechanism. This implementation allowed deadlocks to occur if 389 Directory Server was under a heavy load, which caused the server to become unresponsive. With this update, 389 Directory Server now uses the POSIX implementation of the locking mechanism, and deadlocks no longer occur under a heavy load.
No Regressions, Marking as VERIFIED.