This bug was initially created as a copy of Bug #2234613 Description of problem: issue to investigate a RHDS-11 long etime situation with error "Retry cound exceeded' on BIND/ADD/DEL/MOD from revert_cache ( entry cache ) Version-Release number of selected component (if applicable): RHDS-11.7 on RHEL-8.8 389-ds-base-1.4.3.34-1.module+el8dsrv+18528+22f7779f.x86_64 redhat-release-8.8-0.8.el8.x86_64 How reproducible: N/A, high traffic, and other unknowns in environment. Steps to Reproduce: 1. N/A 2. 3. Actual results: pattern event in errors log: ERR - find_entry_internal_dn - Retry count exceeded (uid= thread signature: Contention on backend lock while reverting TXN failure Many threads (update) are stucked waiting for backend lock => can contribute to worker starvation Thread 36 (Thread 0x7f16839fe700 (LWP 1672443)): #0 0x00007f191177ee92 in flush_hash () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so #1 0x00007f191177f103 in revert_cache () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so #2 0x00007f19117aea7c in ldbm_back_modify () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so #3 0x00007f19204604d0 in op_shared_modify () at target:/usr/lib64/dirsrv/libslapd.so.0 #4 0x00007f192046112b in modify_internal_pb () at target:/usr/lib64/dirsrv/libslapd.so.0 #5 0x00007f1920486479 in pw_apply_mods () at target:/usr/lib64/dirsrv/libslapd.so.0 #6 0x00007f1920486686 in set_retry_cnt_and_time.constprop () at target:/usr/lib64/dirsrv/libslapd.so.0 #7 0x00007f19204867fb in update_pw_retry () at target:/usr/lib64/dirsrv/libslapd.so.0 #8 0x00007f192048c2cb in send_ldap_result_ext () at target:/usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f192048c54f in send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0 #10 0x00007f1920473267 in slapi_send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0 #11 0x00007f191179c01b in ldbm_back_bind () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so #12 0x0000562dfa8c4ed2 in pw_verify_be_dn () #13 0x0000562dfa8b1449 in do_bind () #14 0x0000562dfa8b64b5 in connection_threadmain () #15 0x00007f191ce97968 in _pt_root () at target:/lib64/libnspr4.so #16 0x00007f191c8321cf in start_thread () at target:/lib64/libpthread.so.0 #17 0x00007f191eae5dd3 in clone () at target:/lib64/libc.so.6 Expected results: yes Additional info: RHDS-11.6 related fix: bz 2051476 - high contention in find_entry_internal_dn on mixed load https://bugzilla.redhat.com/2051476 https://access.redhat.com/errata/RHBA-2023:0186 " Cause: Cache c_mutex type was changed from PR_Monitor to pthread recursive mutex implementation. It brought a minor performance boost but also proved to be a less stable solution in its current way. Additionally, another issue happens when updating the parent entry of a deleted entry (numsubordinates), if it fails to lock the parent it does not return the parent entry. Consequence: "find_entry_internal_dn - Retry count exceeded" error appears in the error log with high concurrent mixed operations load on a flat tree. And when the other issue happens, refcnt becomes invalid. Which may lead to other cache locking issues. Fix: Change cache c_mutex type to PR_Monitor. In the case of the failure to lock the parent entry, the entry should be returned. Result: "find_entry_internal_dn - Retry count exceeded" error doesn't appear. And the cache structure exists in the correct state with the correct refcnt. " so ERR - find_entry_internal_dn - Retry count exceeded will happen again, and there have been more reports https://bugzilla.redhat.com/show_bug.cgi?id=2051476#c45
*** Bug 2268186 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: redhat-ds:11 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1372