Bug 2268136 - Long etime and error "Retry count exceeded' on BIND/ADD/DEL/MOD from revert_cache [11.7.z]
Summary: Long etime and error "Retry count exceeded' on BIND/ADD/DEL/MOD from revert_c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Directory Server
Classification: Red Hat
Component: 389-ds-base
Version: 11.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: DS11.7
: dirsrv-11.7
Assignee: LDAP Maintainers
QA Contact: LDAP QA Team
Evgenia Martynyuk
URL:
Whiteboard:
: 2268186 (view as bug list)
Depends On:
Blocks: 2268183
TreeView+ depends on / blocked
 
Reported: 2024-03-06 11:30 UTC by Viktor Ashirov
Modified: 2024-03-20 13:00 UTC (History)
7 users (show)

Fixed In Version: 389-ds-base-1.4.3.34-3.module+el8dsrv+21391+b62d2223
Doc Type: Bug Fix
Doc Text:
.Directory Server now flushes the entry cache less frequently Previously, Directory Server flushed its entry cache even when it was not necessary. As a result, in certain situations, Directory Server was unresponsive and had bad performance. With this update, Director Server flushes the entry cache only when it is necessary.
Clone Of:
Environment:
Last Closed: 2024-03-19 11:26:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 5939 0 None closed During an update, if the target entry is reverted in the entry cache, the server should not retry to lock it. 2024-03-19 15:16:20 UTC
Github 389ds 389-ds-base issues 5944 0 None closed Reversion of the entry cache should be limited to BETXN plugin failures 2024-03-19 15:16:20 UTC
Red Hat Issue Tracker IDMDS-4214 0 None None None 2024-03-06 14:44:38 UTC
Red Hat Issue Tracker IDMDS-4225 0 None None None 2024-03-11 15:01:19 UTC
Red Hat Product Errata RHSA-2024:1372 0 None None None 2024-03-19 11:26:54 UTC

Description Viktor Ashirov 2024-03-06 11:30:19 UTC
This bug was initially created as a copy of Bug #2234613

Description of problem:

issue to investigate a RHDS-11 long etime situation with error "Retry cound exceeded' on BIND/ADD/DEL/MOD from revert_cache ( entry cache )


Version-Release number of selected component (if applicable):

RHDS-11.7 on RHEL-8.8
389-ds-base-1.4.3.34-1.module+el8dsrv+18528+22f7779f.x86_64
redhat-release-8.8-0.8.el8.x86_64


How reproducible:
N/A, high traffic, and other unknowns in environment.

Steps to Reproduce:
1. N/A
2.
3.

Actual results:

pattern event in errors log:

ERR - find_entry_internal_dn - Retry count exceeded (uid=

thread signature:
Contention on backend lock while reverting TXN failure
Many threads (update) are stucked waiting for backend lock => can contribute to worker starvation

        Thread 36 (Thread 0x7f16839fe700 (LWP 1672443)):
        #0  0x00007f191177ee92 in flush_hash () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #1  0x00007f191177f103 in revert_cache () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #2  0x00007f19117aea7c in ldbm_back_modify () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #3  0x00007f19204604d0 in op_shared_modify () at target:/usr/lib64/dirsrv/libslapd.so.0
        #4  0x00007f192046112b in modify_internal_pb () at target:/usr/lib64/dirsrv/libslapd.so.0
        #5  0x00007f1920486479 in pw_apply_mods () at target:/usr/lib64/dirsrv/libslapd.so.0
        #6  0x00007f1920486686 in set_retry_cnt_and_time.constprop () at target:/usr/lib64/dirsrv/libslapd.so.0
        #7  0x00007f19204867fb in update_pw_retry () at target:/usr/lib64/dirsrv/libslapd.so.0
        #8  0x00007f192048c2cb in send_ldap_result_ext () at target:/usr/lib64/dirsrv/libslapd.so.0
        #9  0x00007f192048c54f in send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0
        #10 0x00007f1920473267 in slapi_send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0
        #11 0x00007f191179c01b in ldbm_back_bind () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #12 0x0000562dfa8c4ed2 in pw_verify_be_dn ()
        #13 0x0000562dfa8b1449 in do_bind ()
        #14 0x0000562dfa8b64b5 in connection_threadmain ()
        #15 0x00007f191ce97968 in _pt_root () at target:/lib64/libnspr4.so
        #16 0x00007f191c8321cf in start_thread () at target:/lib64/libpthread.so.0
        #17 0x00007f191eae5dd3 in clone () at target:/lib64/libc.so.6


Expected results:
yes


Additional info:

RHDS-11.6 related fix: bz 2051476 - high contention in find_entry_internal_dn on mixed load
https://bugzilla.redhat.com/2051476
https://access.redhat.com/errata/RHBA-2023:0186
"
Cause: Cache c_mutex type was changed from PR_Monitor to pthread recursive mutex implementation. It brought a minor performance boost but also proved to be a less stable solution in its current way.
Additionally, another issue happens when updating the parent entry of a deleted entry (numsubordinates), if it fails to lock the parent it does not return the parent entry.

Consequence: "find_entry_internal_dn - Retry count exceeded" error appears in the error log with high concurrent mixed operations load on a flat tree.
And when the other issue happens, refcnt becomes invalid. Which may lead to other cache locking issues.

Fix: Change cache c_mutex type to PR_Monitor.
In the case of the failure to lock the parent entry, the entry should be returned.

Result: "find_entry_internal_dn - Retry count exceeded" error doesn't appear. And the cache structure exists in the correct state with the correct refcnt.
"

so
ERR - find_entry_internal_dn - Retry count exceeded
will happen again, and there have been more reports
https://bugzilla.redhat.com/show_bug.cgi?id=2051476#c45

Comment 1 thierry bordaz 2024-03-06 14:53:47 UTC
*** Bug 2268186 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2024-03-19 11:26:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: redhat-ds:11 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1372


Note You need to log in before you can comment on or make changes to this bug.